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Two amino acid mutations in an anti-human CD3 single chain Fv 
antibody fragment that affect the yield on bacterial secretion but 
not the affinity 
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Recombinant antibody fragments directed against cell sur- 
face antigens have facilitated the development of novel 
therapeutic agents. As a first step in the creation of cytotoxic 
immunoconjugates, we constructed a single-chain Fv frag- 
ment derived from the murine hybridoma OKT3, that 
recognizes an epitope on the E-subunit of the human CD3 
complex. Two amino acid residues were identified that are 
critical for the high level production of this scFv in 
Escherichia coli. First, the substitution of glutamic acid 
encoded by a PCR primer at position 6 of V H framework 
1 by glutamine led to a more than a 30-fold increase in the 
production of soluble scFv. Second, the substitution of 
cysteine by a serine in the middle of CDR-H3 additionally 
doubled the yield of soluble antibody fragment without 
any adverse effect on its affinity for the CD3 antigen. The 
double mutant scFv (Q,S) proved to be very stable in vitro: 
no loss of activity was observed after storage for 1 month 
at 4°C, while the activity of scFv containing a cysteine 
residue in CDR-H3 decreased by more than half. The 
results of production yield, affinity, stability measurements 
and analysis of three-dimensional models of the structure 
suggest that the sixth amino acid influences the correct 
folding of the V H domain, presumably by affecting a folding 
intermediate, but has no effect on antigen binding. 
Keywords: affinity/anti-human CD3/bacterial expression/ 
single-chain Fv/solubility 



Introduction 

In recent years, the use of genetic engineering techniques has 
stimulated the development of antibody-like molecules for 
therapeutic and diagnostic uses (Winter and Milstein, 1991). 
Unlike glycosylated whole antibodies, fragments such as Fab 
and Fv can be easily produced in bacterial cells as functional 
antigen binding molecules (Better et al, 1988; Skerra and 
Pliickthun, 1988). To stabilize the association of the recombin- 
ant V H and V L domains, they have been linked in a single- 
chain Fv (scFv) construct with a short peptide that connects 
the carboxy terminus of one domain and the amino terminus 
of the other (Bird et al, 1988; Huston et al, 1988). In 
comparison with the much larger Fab, F(ab) 2 and IgG forms 
of monoclonal antibody from which they are derived, scFvs 
have more rapid blood clearance and better tumor penetration 
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(Milenicera/., 1991; Yokota et al, 1992; Adams et ai, 1993). 
ScFvs therefore represent potentially highly useful molecules 
for the targeted delivery of drugs, toxins or radionuclides to a 

The efficient expression of active antibody fragments in 
bacteria is clearly of great technological importance. However, 
as with the expression of some other heterologous proteins in 
Escherichia coli, the yield of functional product for some 
antibody fragments can be very low. Sometimes. PCR primer- 
induced errors can lead to the expression of non-reactive 
antibody fragments (McCartney et ai, 1995). Poor expression 
may also arise from differences in the translation machinery 
and folding pathways of eukaryotic and bacterial cells. For 
example, some nucleotide sequences encoding antibody vari- 
able regions were expressed as functional proteins in eukaryotic 
host cells but were unable to express a product in bacteria 
(Duenas et al, 1995). Limiting factors for the efficient produc- 
tion of secreted antibody fragments in E.coli appear to be 
translocation to the periplasm (Ayala et al, 1995) and folding 
in the periplasrnic space (Knappik and Pliickthun, 1995). 

OKT3 is a murine monoclonal antibody (mAb) that recog- 
nizes an epitope on the e-subunit of the human CD3 complex 
(Kung et al., 1979; Van Wan we et al, 1980; Transy et al, 
1989). It has significant clinical utility. OKT3 has been widely 
used to suppress T cells and thereby prevent the rejection of 
transplants (Thistlethwaite et al, 1984; Woodle et al, 1991). 
Conversely, T cell activation and proliferation induced by 
OKT3 have been exploited to expand effector cells ex vivo 
for adoptive cancer immunotherapy (Yannelly et ai, 1990). 
As well as being used alone, the OKT3 mAb has been used 
as a component of bispecific antibodies to retarget cytotoxic 
T lymphocytes against tumor cells (Nitta et ai, 1990; Bohlen 
et ai, 1993) or virus infected cells (Sanna et al, 1995). 
Recently, humanized versions of the OKT3 mAb have been 
expressed in COS cells (Woodle etal, 1992; Adair etal, 1994). 

In this paper, we present the first example of the expression 
of an OKT3 derived scFv in E.coli. As part of the anti-CD3 
scFv construction process, the PCR amplified OKT3 V H gene 
was modified to improve its in vivo folding. Here we analyze 
the effect of two amino acid residues in the variable heavy 
chain domain on the yield, affinity and stability in vitro of 
anti-CD3 scFv. 

Materials and methods 

E.coli strains, plasmids and cell lines 

Ecoli K\2 strain XL 1 -Blue (Stratagene, La Jolla, CA) was used 
as the cloning and expression host. For cloning, sequencing 
hybridoma-derived immunoglobulin variable regions and site- 
specific mutagenesis, pCR-Script SK( + ) (Stratagene) was 
used. The scFv gene was assembled and expressed either in 
the plasmid pOPE51 (Kipriyanov et al., 1994) or in pHOG21 
(Kipriyanov et al., 1996b). The hybridoma OKT3 producing 
a monoclonal antibody (IgG2a) against the CD3 human T cell 
antigen has been described previously (Kung et al., 1979; Van 
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Wauwe et al, 1980). The human CD3-positive acute T cell 
leukemia cell line Jurkat and a CD3-negative B cell line JOK-1 
were used for flow cytometry. 
Cloning of the variable regions 

Isolation of mRNA from freshly subcloned hybridoma OKT3 
cells and cDNA synthesis were performed as previously 
described (Diibel et al, 1994). DNA coding for the light chain 
variable domain was amplified by PCR using the primers Bi5 
and Bi8 that hybridize to the amino terminal portion of the K 
chain constant domain and the framework 1 (FR1) region of 
the K chain variable domain (Dtibel et al, 1994). For the 
amplification of DNA coding for the heavy chain variable 
domain, the primer Bi4 that hybridizes to the amino terminal 
portion of the y chain constant 1 domain (Diibel et al, 1994) 
and Bi3f that hybridizes to the FR1 region of the heavy chain 
(Gotter et al, 1995; Kipriyanov et al, 1996b) were used. The 
50 Hi reaction mixture contained 10 pmol of each primer and 
50 ng of hybridoma cDNA, 100 U.M each of dNTP, lXVent- 
buffer (Boeringer Mannheim, Mannheim, Germany), 5 (tg 
BSA and 1 U Vent DNA polymerase. 30 cycles of 1 min at 
95°C, 1 min at 55°C and 2 min at 75°C were carried out in a 
thermocycler. The amplified DNA was purified with a QIA- 
quick PCR Purification Kit (Qiagen, Hilden, Germany) and 
blunt end ligated into an Srfl digested pCR-Script SK(+) 
(Stratagene) for dideoxy sequencing (Sanger et al, 1977) and 
site-specific mutagenesis. 
Construction of plasmids encoding scFv 
The linker used in this study was a 17 amino acid tag-linker 
that includes a tubulin epitope recognized by mAb YOL1/34 
(Breitling et al, 1991). DNA coding for the variable domains 
of OKT3 was inserted into pOPE5l (Kipriyanov et al, 1994) 
in two cloning steps using NcoVHind\\\ for the heavy-chain 
DNA and EcoRV/BamHl for the light-chain DNA. The whole 
scFv gene was recloned in pHOG21 (Kipriyanov et al, 1996b) 
as a NcoVBamHl DNA fragment. 
Construction of anti-CD3 mutants 

Mutations were generated in the V H domain derived from 
OKT3 by site-specific mutagenesis according to Kunkel et al. 
(1987). The amino acid substitution of Cys at position H100A 
by Ser and of Glu at position H6 by Gin was achieved using 
either primer SKI 5 '-GTAGTCAAGGCTGTAATGATCATC 
or SK2 5'-GCCCCAGACTGCTGCAGCTGCAC or both. 
E.coli expression and purification of scFv fragments 
XL1 Blue E.coli cells (Stratagene) transformed with the scFv 
expression plasmid pHOG21 were grown overnight in 2XYT 
medium with 50 u.g/ml ampicillin and 100 mM glucose 
(2XYT GA ) at 37°C. Dilutions (1:50) of the overnight cultures 
in 2XYT GA were grown as flask cultures at 37°C with shaking 
at 200 r.p.m. When cultures reached ODjoq = 0.8, bacteria 
were pelleted by centrifugation at 1500g for 10 min and 20°C 
and resuspended in the same volume of fresh 2XYT medium 
containing 50 Hg/ml ampicillin and 0.4 M sucrose. IPTG was 
added to a final concentration of 0.1 mM and growth was 
continued at room temperature (20-22°C) for 20 h. The cells 
were harvested by centrifugation at 5000 g for 10 min and 
4°C. The culture supernatant was retained and kept on ice. To 
isolate soluble periplasmic proteins, the pelleted bacteria were 
resuspended in 5% of the initial volume of ice-cold 50 mM 
Tris-HCl, 20% sucrose, 1 mM EDTA, pH 8.0. After a 1 h 
incubation on ice with occasional stirring, the spheroplasts 
were centrifuged at 30 000 g for 30 min and 4°C leaving the 



soluble periplasmic extract as the supernatant and spheroplasts 
plus the insoluble periplasmic material as the pellet. The 
culture supernatant and the soluble periplasmic extract were 
combined, clarified by additional centrifugation (30 000 g, 
4°C, 40 min) and passed first through a glass filter of pore 
size 10-16 itm and then through a Membrex TF filter of pore 
size 0.2 \im (MembraPure, Lorzweiler, Germany). The volume 
was reduced 10-fold by concentration with Amicon YM 10 
membranes (Amicon, Witten, Germany). The concentrated 
supernatant was clarified by centrifugation and thoroughly 
dialyzed against 50 mM Tris-HCl, 1 M NaCl, pH 7.0 at 
4°C. Immobilized metal affinity chromatography (IMAC) was 
performed at 4°C using a 5 ml column of Chelating Sepharose 
(Pharmacia) charged with Ni 2+ and equilibrated with 50 mM 
Tris-HCl, 1 M NaCl, pH 7.0 (start buffer). The sample was 
loaded by passing the sample over the column. It was then 
washed with 20 column volumes of start buffer followed by 
start buffer containing 50 mM imidazole until the absorbance 
(280 nm) of the effluent was minimal (about 30 column 
volumes). Absorbed material was eluted with 50 mM Tris- 
HCl, 1 M NaCl, 250 mM imidazole, pH 7.0. After buffer 
exchange to 50 mM MES, pH 6.0, the protein was further 
purified on a Mono S ion-exchange column (Pharmacia). The 
purified scFv was dialyzed into PBS ( 1 5 mM sodium phosphate, 
0. 15 M NaCl, pH 7.4). For long-term storage, scFv were 
frozen in presence of BSA (final concentration 10 mg/ml) and 
kept at -80°C, as recommended (Kipriyanov et al, 1995). 

Isolation of scFv from inclusion bodies of bacteria trans- 
formed with plasmid pOPE5 1 was performed essentially as 
described previously (Kipriyanov et al, 1996a). 
SDS-PAGE and Western blot analysis 

SDS-PAGE was carried out according to Laemmli (1970) 
under reducing conditions. Immunoblot analysis using anti 
c-myc mouse mAb 9E10 (Cambridge Research Biochemicals, 
Cambridge, UK) was performed as described previously 
(Kipriyanov et al, 1994). 
Analyses of scFv stability 

For stability analyses. scFv preparations were stored at 4°C at 
a concentration 50 u.g/ml in PBS for 1 month. The activities 
of samples after storage were determined by flow cytometry. 
Flow cytometry 

We incubated 5X10 5 CD3 + Jurkat or CD3" JOK-1 cells in 
50 ul RPMI 1640 medium (Gibco BRL, Eggenstein, Germany) 
supplemented with 10% fetal calf serum (FCS) and 0.1% 
sodium azide (referred to as complete medium) with 100 ul 
of a sample containing scFv for 45 min on ice. After washing 
with complete medium, the cells were incubated with 100 u.1 
of 10 ug/ml anti c-myc mAb 9E10 (ICI Biochemicals) in the 
same buffer for 45 min on ice. After a second washing cycle, 
the cells were incubated with 100 of FITC-labeled goat 
anti-mouse IgG (Gibco BRL) under the same conditions as 
before. The cells were then washed again and resuspended in 
100 ill of a 1 Jlg/ml solution of propidium iodide (Sigma, 
Deisenhofen, Germany) in complete medium to exclude dead 
cells. The relative fluorescence of stained cells was measured 
using a FACScan flow cytometer (Becton Dickinson, Mountain 
View, CA). 

Measurement of binding affinity 

Affinities were derived either from the FACScan analysis of 
direct binding of scFv to Jurkat cells as described by Chamow 
et al. (1994) or from a competitive inhibition assay. In the 
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latter case, increasing concentrations of scFv were added to 
a subsaturating concentration of FITC-labeled mAb OKT3 
(7.4 nM) and were incubated with Jurkat cells as described 
above. Fluorescence intensities of stained cells were measured 
as described above. Binding affinities were calculated accord- 
ing to the following equation derived from that of Schodin 
and Kranz (1993): 

K m = (1 + [FITC-OKT3] X Ar a(OKT3l )/IC 50 

where I is the unlabeled inhibitor (scFv), [FITC-OKT3] is the 
concentration of FITC-labeled mAb OKT3, K MOKT3) is the 
binding affinity of mAb OKT3 (1.2X10 9 M" 1 ; Adair et ai, 
1994) and IC 50 is the concentration of inhibitor that yields 
50% inhibition of binding. 

Determination of the yield of soluble antibody fragments 
The expression levels of soluble scFv fragments were deter- 
mined in cleared culture medium and in crude periplasmic 
extracts isolated from shake-tube mini-cultures (5 ml). Culture 
supernatants were concentrated 20-fold using an Ultrafree-15 
Biomax-10 centrifugal filter device (Millipore, Bedford, MA, 
USA) and dialyzed into PBS. The periplasmic extracts from 
cell pellets were prepared as described previously (Kipriyanov 
et ai, 1996b). For each scFv variant, three independent 
expression cultures were used. The concentrations of functional 
recombinant antibody fragments were determined from the 
fluorostaining of Jurkat cells using samples of periplasmic 
preparations and concentrated culture medium by the interpola- 
tion of their mean fluorescence intensities on the standard 
curves obtained with purified scFv of known concentration. 
At least four dilutions of samples were used for calculations. 
Molecular modeling 

Modeling was performed using AbM (Oxford Molecular, 
Oxford, UK). The framework was built by homology using 
HyHEL-5 (Sheriff et ai, 1987) for the parent light chain and 36- 
7 1 (Strong et ai , 1 99 1 ) for the heavy chain. The complementarity 
determining regions (CDR) LI, L2, L3, HI and H2 were built 
using canonical classes as proposed by Chothia et ai (1989) 
(CDR-L1 = Class 1, CDR-L2 = Class 1, CDR-L3 = Class I, 
CDR-H1 = Class 1, CDR-H2 = Class 2) while CDR-H3 was 
built using the CAMAL algorithm (Martin etai, 1989). 

AbM sometimes has problems with junction regions where 
loops are spliced on to the framework. This can result in 
trigonal planar or D-amino acids at these junction sites. This 
occurred for residue H102 and this residue was rebuilt manually 
as an L-amino acid. 
Other methods 

Protein concentrations were determined by the Bradford dye- 
binding assay (Bradford, 1976) using the Bio-Rad protein 
assay kit (Bio-Rad Laboratories, Munich, Germany). The 
concentrations of purified scFv were calculated from the A 2 8o 
values using the extinction coefficient e 1 mg/ml = 1.84 derived 
from the Trp, Tyr and Phe content of the molecule using 
DNAid+1.8 Sequence Editor for Macintosh (FDardel and 
RBensoussan, Laboratoire de Biochimie, Ecole Polytechnique, 
Palaiseau, France). Analytical gel filtration of the scFv prepara- 
tion was performed in PBS using a Superdex 75 HRI0/30 
column (Pharmacia). The sample volume and flow rate were 
200 til and 0.5 ml/min, respectively. For calibration of the 
column, a Low Molecular Weight Gel Filtration Calibration 
Kit (Pharmacia) was used. 
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Fig. 1. Critical features of the OKT3 antigen binding site. The molecular 
model of OKT3 Fv is shown as a Ca trace with side chain* of ammo acid 
residues HI0OA and H6. 

Results 

Modification of PCR amplified OKT3 V H gene 
The V region genes of the murine monoclonal antibody OKT3 
(Van Wauwe etai, 1980) were amplified by PCR from a cDNA 
preparation using two pairs of highly degenerate primers. Ten 
clones of each amplified V H and V L gene were sequenced and 
found to be identical. All the analyzed heavy chain variable 
regions contained QVQLQE as the N-terminal sequence. This 
region was encoded by the 5' primer Bi3f that contained a 
degenerate codon corresponding either to glutamic acid or 
glutamine residue at position 6 (Goiter et ai, 1995; Kipriyanov 
et ai, 1996b). The OKT3 scFv gene was assembled in the 
plasmid pOPE5l (Kipriyanov et ai, 1994) and expressed in 
E.coli. The resulting recombinant scFv product contained an 
unpaired cysteine residue near to its C-terminus that was 
specially introduced for making bivalent antibodies by 
chemical conjugation or for site-specific biotinylation 
(Kipriyanov et ai, 1994). 

FACScan analysis demonstrated no binding of scFv-OKT3 
isolated from periplasmic inclusion bodies (Kipriyanov et ai, 
1994; Kipriyanov et ai, 1995) to CD3-positive Jurkat cells 
(data not shown). A detailed analysis of the predicted structure 
based on the OKT3 V domain sequences allowed us to identify 
two amino acid residues in the V H domain that might be 
critical for the activity of this recombinant antibody (Figure 
I ). First, a comparison with the OKT3 cDNA sequence (Adair 
et ai, 1994) showed that position 6 of FR1 was occupied by 
glutamine but not by glutamic acid as in the PCR amplification 
product. Furthermore, the consensus sequences of the Kabat 
database demonstrated that the cloned V H gene fragment 
belongs to mouse immunoglobulin subgroup lib (Kabat et ai, 
1991), in which 92% of the members have Q in position 6. 
Second, the OKT3 V H domain was found to contain a cysteine 
residue in the CDR-H3 which could interfere with folding by 
disrupting normal disulfide bonding, or might be oxidized 
during IMAC on an Ni column under denaturing conditions 
(Kipriyanov etai, 1994). Therefore, we performed site-specific 
mutagenesis of the V H gene to substitute E6 by Q and C100A 
[numbering scheme of Kabat etai (1991)] by S. This double- 
mutant scFv-dmOKT3 (Q,S) demonstrated strong binding to 
CD3-positive Jurkat cells and no interaction with CD3-negative 
JOK-1 cells when purified from inclusion bodies (data not 
shown). 
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Construction and expression of anti-CD3 scFv mutants 
To clarify how the amino acid changes described above 
contribute to the activity of the anti-CD3 scFv, we investigated 
four different scFv variants; a variant containing E6 and 
C100A that was amplified from hybridoma cDNA by PCR 
(E,C), a variant corresponding to the cDNA sequence published 
for OKT3 (Q,C; Adair et al, 1994) and two variants containing 
Ser instead of Cys at V H position 100A (E,S and Q,S). 

To avoid working with inclusion bodies, which have to be 
refolded and to prevent vector-derived C-terminal unpaired 
cysteines from affecting the scFv properties (e.g. possible 
formation of an additional intramolecular disulfide bond with 
Cys-IOOA or scFv dimerization), we chose the plasmid 
pHOG21 for expressing the mutated scFv genes (Figure 2A). 
The bacterial pHOG21 expression vector was designed for 
the high-level production of soluble recombinant antibody 
fragments in E.coli (Kipriyanov et al., 1996b). The antibody 
V H fragment is preceded by a pelB leader sequence for 
secretion of recombinant antibody into the periplasmic space. 
The C-terminus of the V H domain and N-terminus of the V L 
domain are joined by a flexible 17 amino acid tag-linker that 
includes a tubulin epitope recognized by mAb YOL1/34 
(Breitling et al., 1991). A short peptide tag containing an 
epitope of the proto-oncogene c-myc recognized by mAb 9E10 
(Evan et al., 1985) is located at the C-terminus of the V L 
domain followed by six histidine residues to facilitate the 
isolation of recombinant antibody fragments by [MAC. The 
sequence of the OKT3 derived scFv assembled in the plasmid 
pHOG21 is shown in Figure 2B with the mutations at amino 
acid positions 6 and 100A of the heavy chain indicated. 

Recently, we showed that the addition of 0.4 M sucrose to 
the growth medium gives a 15-25-fold increase in the yield 
of soluble scFv for bacterial shake-tube cultures and an 80- 
150-fold increase for shake-flask cultures (Kipriyanov et al., 
1997). We also found that the scFv could be made to accumulate 
in the periplasm or be secreted into the medium by simply 
changing the incubation conditions and the concentration of 
the inducer. Therefore, to obtain higher yields of soluble anti- 
CD3 antibody fragments, we incubated induced E.coli cells in 
the presence of 0.4 M sucrose. Western blot analysis of cell 
pellets and periplasmic extracts of bacterial cultures expressing 
the four variants of OKT3 derived scFv demonstrated substan- 
tial differences in the ratio of soluble and total scFv (Figure 
3). While the total amount of recombinant product found in 
the cell pellet seemed to be equal for all scFv variants, much 
less soluble scFv was found for variants containing GIu at 
position 6 (Figure 3, lanes 2 and 6). FACScan analysis 
demonstrated the specific binding of periplasmic extracts for 
all the anti-CD3 scFv variants to CD3-positive Jurkat cells, 
although the fluorescence intensity obtained for scFvs with E6 
was significantly lower (Figure 4A). 
Purification of anti-CD3 scFv variants 

To clarify whether the difference in antigen binding activity 
of periplasmic extracts containing different scFv variants 
(Figure 4A) is due to the difference in affinity or merely 
reflects the production levels of soluble antibody fragment, we 
performed a large-scale isolation of scFv using shake-flask 
bacterial cultures in the presence of 0.4 M sucrose. Under 
these conditions, we previously found that most of the secreted 
scFv was released into the medium (Kipriyanov et al., 1997). 
The supernatant and periplasmic content of the induced bac- 
terial culture was concentrated and passed through an Ni 2+ 
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Fig. 2. The structure of scFv and expression vector. (A) Schematic 
representation of the plasmid pHOG21. Ap R , ampicillin resistance-encoding 
gene; c-myc, a sequence encoding an epitope recognized by the monoclonal 
antibody 9E10; ColEI, origin of DNA replication; ///C, intergenic region of 
phage fl: His 6 , a sequence encoding six C-terminal histidine residues; 
linker, a sequence encoding 17 amino acids connecting the V H and V L 
domains; pelB. signal peptide sequence of bacterial pectate lyase; P/O, wt 
lac promoter/operator. (B) The nucleotide and deduced amino acid 
sequences of the scFv derived from hybridoma OKT3. The amino acid 
sequences corresponding to the complementarity determining regions (CDR) 
are shown shaded. The nucleotide sequences corresponding to PCR primers 
are underlined. The sequences coding for the YOL epitope in linker region 
as well as c-myc epitope and six histidines in the carboxy terminal part of 
the scFv are indicated. 

charged Chelating Sepharose column. After washing the 
column with buffer containing 50 mM imidazole, the bound 
scFv was eluted with 250 mM imidazole as a single peak in 
2.5 column volumes. This purification procedure allowed us 
to isolate scFv in one step with a purity of about 95% (Figure 
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Fig. 3. Western blot analysis of cell pellets and periplasmic extracts from 
C.iMi clones expressing diffcienl anil CD3 scFv valiants. Lam.-!.: I, 3. 5, 7, 
total cell lysate from induced bacteria corresponding to 100 of culture; 2 
4, 6, 8, periplasmic extracts corresponding to 180 ul of culture. The scFv 
were detected using mAb 9E10 recognizing the C-terminal c-myr epitope. 
As a control, 1 u.g of pure scFv-dmOKT3 isolated from inclusion bodies 



5A). The main contaminant present in samples of scFv purified 
by IMAC has recently been identified as an E.coli metal- 
binding 27 kDa WHP protein (Wulfing et at., 1994). An 
analysis of its amino acid composition showed that the WHP 
protein has an isoelectric point (la) of 5.16; anti-CD3 scFv 
variants were found to be more basic (the calculated la was 
between 7.27 for the E,C and 7.52 for the Q,S variant). This 
charge difference allowed us to purify the recombinant antibody 
fragments to homogeneity by ion-exchange chromatography 
on a Mono S column (Figure 5B). Analytical gel-filtration on 
a Superdex 75 column demonstrated that all the isolated scFv 
preparations consisted only of monomers (data not shown). 
Affinity and stability measurements 

Our attempts to use radioiodinated scFv preparations for 
measuring the direct binding of recombinant antibodies to 
CD3-positive Jurkat cells were unsuccessful. Unfortunately, 
iodination using chloramine-T yielded an inactive product for 
both anti-CD3 scFv and Fab fragment prepared from mAb 
OKT3 (data not shown). It is possible that iodination blocked 
tyrosine residues in the CDR regions that may be important 
for antigen-binding (Figure 2B). We therefore employed two 
different non-radioactive approaches based on flow cytometry 
(Bohn, 1980) that do not require any modification of the protein. 

In the first approach, recombinant antibody fragments were 
incubated with cells as in a standard radioprotein binding 
assay, except that an anti-c-myc mAb and fluorescent anti- 
mouse IgG reagent were used to detect the amount of bound 
scFv. In comparison with a standard radioligand binding assay, 
the same variables (except the number of molecules bound at 
saturation) can be measured and an affinity constant determined 
from the slope of the resultant Scatchard curve (Chamow 
et at., 1994). 

The binding of scFv preparations was measured using human 
Jurkat cells as a source of naturally expressed cell bound 
CD3e. Binding to CD3-negative JOK-1 cells was used as a 
negative control. The results of fiuorostaining of Jurkat cells 
displayed in Figure 4B demonstrate that the same concentra- 
tions of different scFv variants yield similar fluorescence 
(slightly higher values were obtained for variants containing 
Gin at position 6). A pattern of increased fluorescence with 
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Fig. 4. Flow cytometric analysis of the binding of anti-CD3 scFvs to Jurkat 
cells. (A) Analysis of binding of periplasmic extracts. (B) Analysis of 
binding of pure scFv preparations at concentration 25 u.g/ml. The presence 
of two peaks of fluorescence indicates that not all cells of the used line 
express CD3 antigen. As a negative control, binding to CD3-negative 
JOK-1 was used. 



increased amounts of scFv was observed that seems to reach 
a plateau at higher concentrations (Figure 6A). On the basis 
of fluorescence measurements at different concentrations of 
added scFv, typical Scatchard curves were generated from 
which K a values were derived (data not shown). 

In a second approach, the binding efficiency of anti-CD3 
scFv variants to Jurkat T cells was investigated by competition 
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with FITC-labeled mAb OKT3. The data presented in Figure 
6B demonstrate that the OKT3 derived scFv Q,C, E,S and 
Q,S variants competed similarly, all at -100 times the concen- 
tration of the intact IgG OKT3. 

Analysis of the stability of anti-CD3 scFv variants after 
storage in PBS for 1 month at 4°C demonstrated a substantial 
loss of antigen-binding activity for scFv containing Cys in 
CDR-H3 (Figure 7). 

Table I summarizes the results of the affinity and stability 
measurements. The apparent affinity values obtained for all 
the scFv variants proved to be quite close, indicating (i) only 
a slight effect of the sixth amino acid on the antigen binding 
and (ii) that the replacement of Ser for Cys in the middle of 
CDR-H3 does not disturb the antigen-antibody complex. Both 
the glutamic acid at position H6 and especially the cysteine 
at position of H100A led to a decreased stability of the scFv, 



probably because of a higher tendency for such antibody 
fragments to aggregate and/or for oxidation of unpaired cyst- 
eine residues during storage. No proteolytic degradation during 
storage was detected for any of the examined scFv variants 
(data not shown). 
Analysis of expression yields 

To study the influence of positions H6 and H100A on the 
production levels of soluble scFv fragments, we analyzed the 
antigen binding activities of periplasmic extracts and the 
concentrated culture medium of bacteria expressing scFv E,C, 
Q,C, E,S and Q,S variants. The expression yield data presented 
in Table 1 demonstrate that a single amino acid substitution of 
E6 by Q yields more than a 30-fold increase in soluble scFv 
product. In contrast, the single exchange of C100A by S led 
to a more moderate twofold increase in soluble scFv. These 
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effects were cumulative: the total yield of the Q,S variant was 
66-fold higher than that for the E,C scFv variant (Table I). 
For all the examined variants, a small proportion of the 
functional soluble scFv was found to be released into the 
culture medium. 



Recombinant antibody fragments directed against cell surface 
antigens can provide useful components for the development 
of therapeutic agents. To target cytotoxic effector T cells to a 
tumor site, we have constructed an anti-human CD3 single- 
chain antibody by PCR amplification of the immunoglobulin 
variable domain genes from cDNA of the hybridoma OKT3. 
Expression of the assembled scFv gene in E.coli yielded a 
non-functional product after refolding from inclusion bodies. 

In general, the primers we and other workers use for 
amplifying V genes from hybridoma cDNAs are designed to 
match all the known sequences of immunoglobulin genes. 
However, PCR amplification using degenerate primers does 




>n of OKT3 derived scFv 
v. lA.i Fluorescence intensity obtained for fresh scl'v 
at concentration 25 Mg/ml. (B) Fluorescence intensity obtained 
for the .same scFv preparations after storage in PBS for I month at 4'C. As 
cotuiols, the interaction of culture medium, mAb OKT3 and irrelevant mAb 
HD20 with Jurkat cells is shown 



not always yield a gene with naturally occurring codons in the 
primer region (McCartney et al, 1995). It is therefore often 
not possible to know which codons occur naturally if, as in 
our case, the DNA sequence was not then available. For 
example, the same set of primers resulted either in Glu or Gin 
in H6 after amplification of the V H gene of an antibody against 
anti-human CD 19 (Kipriyanov et al., 1996b). Regarding the 
significance of this position, there was no indication in the 
literature that it may be critical for bacterially expressed 
antibody fragments. 

To improve the properties of the recombinant antibody 
fragment, we focused on the amino acid residues which are 
structurally uncommon for the V H subgroup lib: glutamic acid 
at the position H6 of FR1 and a cysteine in the middle of the 
CDR-H3 loop. Site-specific mutagenesis and a change of 
expression system (soluble secreted scFv versus inclusion 
bodies) allowed us to clarify their influence on the production 
of a functional scFv antibody fragment. 

We demonstrated that a single amino acid substitution of E 
by Q at position 6 of the heavy chain resulted in a 30-fold 
increase in soluble scFv product and significantly increased 
the stability of the recombinant molecule during storage. 
However, this substitution had very little effect on the affinity 
(scFv containing Q had affinity constants about 1.5 times 
higher than variants with E). This slight difference may be 
explained by the possible difference in the percentage of 
functional scFv (Kipriyanov et al., 1994). We can therefore 
conclude that the sixth amino acid influences the correct 
folding of the V H domain, perhaps by affecting some folding 
intermediate, but it has little or no effect on antigen binding. 
This conclusion was supported by computational molecular 
modeling. Examination of the residues which surround position 
6 of the heavy chain in the three-dimensional model reveals 
no reason why a Glu or Gin residue should have any significant 
effect on the conformations of the CDRs (Figure 1). 

It is not clear how Glu may effect the folding of the scFv 
fragment in the bacterial environment because very little is 
known about this process. Attempts have been made to prevent 
the side reaction of aggregation by overexpressing some 
known enzymes of the E.coli folding machinery such as 
the GroES/L chaperones, disulfide-isomerase and proline-w- 
fra/w-isomerase (Knappik et al., 1993; Duenas et al., 1994). 
However, these proteins did not increase the yield of soluble 
antibody fragments. The presence of a periplasmic chaperone 
has therefore been postulated but not yet identified (Wulfing 
and Pluckthun, 1994). From a variety of experiments, evidence 
is accumulating that the primary sequence of the antibody 



Table I. Expression levels of anti-CD3 scFv variants, tht 


ir stabilities and affinities to hu 


nan CD3 antigen 




scFv Yield of scFv 
variant (ug/1 of culture)" 


of scFv (%) b 


(M-'/io 7 ) uvr'/io 7 ) 


Stability 


E,C 72.7 ± 19.5 f 
Q.C 2314.7 ± 578.8 
E,S 148.4 ± 37.7 
Q.S 4846.0 ± 477,3 


29.4 ± 1.9 


1 ,09 n.d.s 
1.96 3,16 
1.27 2.49 
1.42 2.95 


37.87 
46.63 


-Total amount of soluble scF\ both in crude periplasmic 
"Percentage of total scFv amount found in culture mediu 




lated by flow cytometry. 





"Binding constants as determined by cytolliiorometric Scatchard analysis. 
''Binding constants as determined from a competitive inhibition assay using FITC-OKT3 
' Actnity i'Vi after I month al 4 C as determined by flow cytometry. 
'Arithmetic mean and standard deviation based on three independent experiments. 
s Not determined, 
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plays a decisive role in the efficiency of folding in a bacterial 
environment (Carter et al, 1992; Knappik and Plilckthun, 
1995). Our own results lead to a similar conclusion. The amino 
acid H6 influences not only the folding efficiency but also the 
stability of the correctly folded scFv. 

The single exchange of Cys at position H100A by Ser also 
led to a twofold increase in soluble scFv. Three residues before 
the start of CDR-H3 is a conserved cysteine at position H92 
which forms a structural disulfide bond with position H22. 
Thus, having another Cys nearby (at H100A) could easily 
allow mis-folding where H100A instead of H92 is involved 
in forming the disulfide bond with H22, thereby generating a 
mis-folded, insoluble and non-functional product. Analogously, 
Ostermeier et al. (1995) demonstrated that substitution of an 
uncommon cysteine at position H50 (first amino acid of CDR- 
H2) by a serine led to a 20-fold increase in soluble Fv 
production. Since the authors of this work were working with 
an Fv fragment, a mutation in the V H domain can only 
influence the yield of the heavy chain fragment. Although 
direct comparisons of the influence of Cys residues in different 
CDRs on correct folding cannot be made, these results suggest 
that such uncommon residues may be more critical for the 
folding of a single antibody domain (V H ) than for scFv. 

In our case, a cysteine was substituted that is present directly 
in the middle of CDR-H3, which is in the middle of the 
antigen-combining site and generally has the greatest influence 
on binding affinity. CDR-H3 plays a prominent role not only 
in ligand binding, but also in the contact with the V L domain 
and with the other CDRs (Padlan, 1994). Although cysteine 
can occasionally form hydrogen bonds, this is rare in proteins 
(Baker and Hubbard, 1984) and it is a relatively hydrophobic 
residue. We therefore considered two possible mutations at 
H100A: serine (maintaining the size as closely as possible, 
but introducing a very hydrophilic residue) and valine (increas- 
ing the hydrophobic nature, but adding an extra atom). Given 
that the residue is exposed to solvent in the model, we 
chose to make the mutation to serine since any increase in 
hydrophobicity could lead to a change in folding of the loop. 
We were aware, however, that the substitution might interfere 
with antigen binding or influence the contact between the 
variable domains. Fortunately, the Cys to Ser mutation had no 
effect on antigen binding and, as hoped, led to a significant 
improvement in the stability of the scFv. Although the exposure 
patterns of the various amino acid types in immunoglobulins 
are comparable to those in other water-soluble proteins, 
cysteines are more exposed in CDRs than they are in the 
framework regions (Padlan, 1994). This is especially true for 
short (10 residues or less) hypervariable loops which do not 
have much opportunity to bury one of their residues while 
maintaining a distorted hairpin conformation for antigen bind- 
ing. Exposure of the cysteine 100A SH group to solvent may 
result in oxidation or modification of the group over time and 
this may have an influence on the stability of the antigen- 
antibody complex. It is also possible that the unpaired cysteines 
of two adjacent scFv molecules could form a disulfide bond, 
thus giving rise to inactive and probably insoluble scFv dimers 
and causing a decrease in the concentration of functional scFv. 
These factors would explain the experimentally observed 
instability during the storage of scFv variants containing Cys 
in CDR-H3. 

It is worth noting that in the present work we actually 
compared two different strategies of folding, i.e. in vivo and 
in vitro. The renaturation procedure, which has been used to 



refold several antibody fragments (Kipriyanov et al., 1994; 
Gotter et al, 1995) and a more complex scFv::streptavidin 
fusion protein (Kipriyanov et al, 1996a) did not lead to the 
formation of an active scFv-E.C variant. These results point 
to limitations in the folding strategy in vitro compared with 
in vivo and indicate how such problems can be overcome. 

In conclusion, we have constructed a modified version of 
an anti-human CD3 scFv antibody fragment with improved 
stability in vitro and increased production level in bacteria. 
This molecule may be particularly useful for the creation of 
recombinant cytotoxic immunoconjugates. 
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Structural Effects of Framework Mutations on a Humanized 
Anti-Lysozyme Antibody 1 

Margaret A. Holmes,* Timothy N. Buss,* and Jefferson Foote 2 ** 

A humanized version of the mouse anti-lysozyme Ab D1.3 was previously constructed as an Fv fragment and its structure was 
crystallographically determined in the free form and in complex with lysozyme. Here we report five new crystal structures of 
single-amino acid substitution mutants of the humanized Fv fragment, four of which were determined as Fv-lysozyme complexes. 
The crystals were isomorphous with the parent forms, and were refined to free R values of 28-31% at resolutions of 2.7-2.9 A. 
Residue 27 in other Abs has been implicated in stabilizing the conformation of the first complementarity-determining region 
(CDR) of the H chain, residues 31-35. We find that a Phe-to-Ser mutation at 27 alters the conformation of immediately adjacent 
residues, but this change is only weakly transmitted to Ag binding residues in the nearby CDR. Residue 71 of the H chain has been 
proposed to control the relative disposition of H chain CDRs 1 and 2, based on the bulk of its side chain. However, in structures 
we determined with Val, Ala, or Arg substituted in place of Lys at position 71, no significant change in the conformation of CDRs 
1 and 2 was observed. The Journal of Immunology, 2001, 167: 296-301. 



Humanized Abs are created by replacing the complemen- 
tarity-determining regions (CDRs) 3 of a human Ab (as 
defined by Wu and Kabat; Refs. I. 2) with the corre- 
sponding CDRs of a nonhuman Ab (3). This CDR graft transfers 
the antigenic specificity of the CDR donor molecule, but leaves the 
new engineered molecule immunologically human, inasmuch as 
the immunogenicity of humanized Abs in humans is extremely low 
(4, 5). The first humanized Ab was specific for the hapten nitro- 
phenacetyl. This molecule had been CDR grafted in the H chain 
only, which was coexpressed with a mouse L chain. The human- 
ized anti-nitrophenacetyl showed 1.5- to 3-fold reduced hapten af- 
finity relative to a control molecule with murine sequences in both 
chains (3). This finding of altered affinity proved that framework 
residues can influence the structure of the Ag combining site. Riech- 
mann et al. (4) confirmed this finding in a humanized anti-CD52. 
The initial humanized construct showed weak avidity. A single 
Ser-to-Phe mutation at framework residue H27 4 restored avidity to 
near that of the fully murine control. The importance of framework 
residues in maintaining the structure of the CDRs and the frequent 
need for mutational revisions in the framework have since been 
confirmed many more times during the engineering of humanized 
Abs to have avidity matching that of their murine antecedents (6). 



We developed a humanized anti-lysozyme (HuLys) as a model 
system for studying structural issues attending the transfer of 
CDRs from a murine to a human framework (7-9). Thus, murine 
and human segments for the construction were chosen from among 
Ab V domains whose structures had been determined. The six 
CDRs of HuLys come from the murine Ab D1.3, which was raised 
against hen egg lysozyme (10, 1 1). The structure of the D1.3 het- 
erodimer of H and L chain V regions (Fv) has been determined at 
1.8-A resolution in both the liganded and unliganded forms (12, 
13). The HuLys H chain framework (residues H1-H30, H36-H49, 
H66-H94, and H103-HI13 in the Kabat numbering system) 
comes from the human myeloma protein NEW, whose structure 
has been determined at 2.0 A (14). The k L chain framework 
(residues LI-L23, L35-L49, L57-L88, and L98-L108) is a con- 
sensus sequence similar to that of the human Bence-Jones protein 
REI, also determined at 2.0 A (15). 

The crystal structures of the HuLys Fv in free form (16) and 
complexed with the Ag lysozyme (17) were previously determined 
at 2.9 and 2.7 A, respectively (Brookhaven Protein Data Bank 
accession numbers IBVL and IBVK). In this work, we describe 
crystal structures of a series of single substitution mutants of the 
HuLys Fv, viz H27S, H7IV, H7IA, and H71R. 
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' Residues are numbered using the Kabat system and preceded by a chain designator, 
e.g., H71 for residue 71 in the H chain. The wild-type Fv has Phe at residue H27 and 
Lys at residue H71; mutant molecules are designated by the substitution, e a., H7 I V 
is an Fv with Val at residue H71 



Materials and Methods 

Protein engineering 

Fvs were expressed in Escherichia coli using the pAK 1 9 vector ( 1 8), which 
uses aphoA promoter and heat-stable enterotoxin II leader sequence. This 
vector directs gene products to the periplasm, from which correctly folded, 
disulfide-oxidized molecules arc released alter cell harvest. Material used 
in the present work was released from the periplasm by osmotic shock and 
purified by affinity chromatography on lysozyme-Sepharose, as described 
previously (17). Protein concentrations were determined spectrophoto- 
nietrically, using calculated extinction coefficients (19). 

Crystal growth 

Crystals of the four mutant complexes were grown in the same way as the 
native complex crystals (17). Each of the HuLys Fv solutions was mixed 
with a lysozyme solution in equimolar proportions. The mixtures then sat 
from several hours to 2 days. PBS was added to dilute the solution, which 
was centrifuged before use. Protein concentrations ranged from 6.5 to 10.5 
mg/ml. The reservoir for vapor diffusion was 0.8 M K 2 HP0 4 , 0.8 M 
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Table I. Data collection 
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Cell dimension:. 1 A I 
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.9; c= 173.3 
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.6; c= 174.1 


a=b=91. 
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Resolution (A) 
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50.0-2.7 2.75-2.70 


50.0-2.7 
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50.0-2.7 


2.75-2.70 
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Measured reflections 
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>2580 


141,035 >3001 


108,158 


>2271 


141,201 


>2672 


88,937 


>1849 


Unique reflections 


23,047 


1060 


22,318 1133 


21,229 


1044 


22,707 


1025 


16,618 


751 


Completeness (%) 


96.3 


90.3 


95.1 97.5 




90.5 


95.6 


89.0 


92.3 


87.0 




0.065 


0.315 


0.067 0.404 


0.059 


0.310 


0.065 


0.376 


0.075 


0.403 


Average I/cr, 


10.1 


2.6 


18.1 2.3 


17.4 


2.6 


16.5 


2.3 


17.2 


2.5 
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NaH 2 P0 4 , 0.1 M HEPES, pH 6.5. Sitting drops consisting of equal vol- 
umes of complex solution and reservoir solution were set up in 
microbridges. 

Crystals of the uncomplexed H7 1 V Fv were grown by macroseeding. 
The seeds were obtained from a hanging drop vapor diffusion crystalliza- 
tion that used 16 mg/ml protein and a reservoir of 0.74 M sodium citrate, 
0.01% NaN 3 , pH 6.5. Two rounds of seeding were performed. Each time, 
a few crystals were removed from the drop and placed in a microbridge in 
a fresh drop composed of equal volumes of Fv solution (16 mg/ml) and 
reservoir solution (0.8 M sodium citrate, 0.01% NaN 3 , pH 6.5). 

Data collection 

X-ray diffraction data sets were collected from single crystals at 4°C using 
an R axis detector. The data sets were processed with DENZO and SCALE- 
PACK (20, 21). Details of the processing are given in Table 1. Before 
refinement, the data sets were partitioned into a working set and a test set. 
The test sets for the complexes contained only reflections that had made up 
the test set for the refinement of the native complex structure, so as to 
maintain the independence of the test set (22). The test set for the uncom- 
plexed Fv was created by X-PLOR (23), as the refinement of the native Fv 
structure did not involve a test set. 

Refinement 

Refinement of the structure of the HuLys H27S Fv-lysozyme complex 
began with the model of the native complex with residue H27 changed to 
Gly. A round of rigid body refinement at 3.5-A resolution was followed by 
rounds of positional refinement at 2.7-A resolution using X-PLOR and 
model building of the loop containing the mutation. A cycle of torsion 
angle molecular dynamics refinement was run, followed by more rounds of 
positional refinement and model building. Omil map density was sufficient 
to model only one of the two H27 side chains. The refinement was com- 
pleted with a cycle of individual B value refinement with TNT (24) and a 
cycle of X-PLOR B value refinement. Refinement statistics are given in 
Table II. 

Refinement of the structures of the HuLys H7IV. H71A, and H7IR 
Fv-lysozyme complexes was more straightforward. The starting model was 



the native complex with H71 changed to Gly. A round of rigid body re- 
finement at 3.5-A resolution was followed by a cycle of positional refine- 
ment at 2.7-A resolution, addition of the H71 side chains to the model, and 
a second cycle of positional refinement. The refinement was completed 
with one cycle each of individual B value refinement with TNT and X 
PLOR (Table II). Manual changes of the model, other than placement of 
the H71 side chain, were needed only for the H71A complex. 

Refinement of the structure of the uncomplexed HuLys H71V Fv began 
with H71 changed to Gly. First, a round of rigid body refinement was 
conducted at 3.5-A resolution. Next came two rounds of positional refine- 
ment at 2.9 A, alternating with model-building and addition of the H7 1 side 
chains. Group B values (I B per residue) were refined with X-PLOR, and 
a final cycle of positional refinement was performed (Table II). 

No solvent molecules are present in any of the models. PROCHECK 
(25) analyses of the five structures show no residues in disallowed regions 
other than L5 1 , which is in a 7-turn conformation, as seen in the native and 
other related structures (26). 

Results 

H27S structure 

The structure of the HuLys Fv mutant H27S was determined as a 
lysozyme complex in a crystal form identical with the complex 
structure obtained previously (17). The crystallographic asymmet- 
ric unit contains two Fv:Ag complexes, which we designate mol- 
ecule 1 and molecule 2. Both Fvs superpose well on the corre- 
sponding Fvs of the H27F structure, with root mean square (rms) 
differences in Ca position of 0.5 A for each of the two complexes. 
Despite these identical rms differences, two different conforma- 
tions are present in the two crystallographically independent H27S 
molecules. Comparing molecule 1 of H27F and H27S, differences 
in Ca position of up to 2.7 A occur at residues H23-H29, adjacent 
to CDR-H1. The overall effect is that in the H27S structure, this 
portion of the molecule has moved away from the position of the 



Table II. Refinement 



Resolution (A) 10.0-2.7 10.0-2.7 10.0-2.7 10.0-2.7 10.0-2.9 
Reflections 

Total (F>2o-) 20,005 19.501 18,592 19,535 14,455 

Working set 18,105 17,642 16,805 17,653 13,014 

Test set 1.900 1.859 1,787 1,882 1,441 

Atoms 5,478 5,486 5,482 5,494 3 484 

R value" 

Working 0.203 0.202 0.202 0.207 0.225 

Free 0.313 0.291 0.291 0.297 0,279 
rms deviation from ideality 

Bond lengths (A) 0.015 0.014 0.015 0.014 0.021 

Bond angles (°) 1.8 ■ 1.8 1.9 1.8 2 5 
PROCHECK analysis 

% in most favored regions 80.9 81.5 80.5 78.8 78.6 

Estimated error in atomic position (A)" 0.33 0.34 0.33 0.34 0.37 
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FIGURE 1. Structural effects of different amino 
acids at position H27, molecule I. Stereo view of 
H chain CDR 1 and adjacent peptide segment in 
H27F (black) and H27S (gray). Atomic coordi- 
nates were taken from molecule 1 from the H27F 
and H27S Fv-lysozyme complex structures. The 
entire H27S-lysozyme complex was superposed 
on the H27F-lysozyme complex. The superposed 
molecules were used for this illustration. Only the 
peptide backbone from residues H22 to H35 and 
the side chains from residues H27 and H28 were 
drawn, to make clear the conformational changes 
that occur when Ser or Phe is substituted at posi- 
tion H27. Residue H27 in the H27S mutant was 
modeled as Gly. 




Phe side chain present in H27F, toward the H chain N terminus and 
lysozyme. creating a more open loop (Fig. 1 ). Residues H74-H76, 
which pack against CDR-H1, have moved into the space created 
by this shift. Eight of the Cct shifts larger than twice the rms dif- 
ference come from residues H23-H29 and H74 and H76. (The 
others are at chain termini or at locations remote from the com- 
bining site.) Modeling of residues H22-H31 was difficult, and (he 
side chain at position H27 could not be fit at all. The possibility 
exists that this remodeled region is in more than one conformation. 

H27S molecule 2 shows a clear difference from the correspond- 
ing molecule 2 of the H27F Fv. The Ser and Phe side chains at the 



;. As evident in Fig. 2, 
n the interior of the extended 
whereas the Ser side chain in 
■. As predicted (4, 9), substi- 
ivity. Residue Ser H28 ii 



substitution site point in opposite dir 
the phenyl ring of H27F is buried i 
loop formed by residues H23-H35, 
H27S points to the aqueous exteriot 
tution of Ser for Phe has created a ci 

H27S Fv has shifted so that its main chain and side chain have 
moved into space occupied by the Phe H27 side chain in the H27F 
Fv. This large perturbation in backbone conformation extends for 
several residue positions along the peptide backbone, as is evident 
in Fig. 2. The shifts of the Ca atoms of residues H23-H3 1 account 
for 9 of the 13 shifts greater than twice the rms difference between 
the mutant and native complexes. A shift at H75 accounts for one 
more, and the others are at chain termini. 



Although the conformation of the loop preceding CDR-H1 dif- 
fers significantly in H27S and H27F, structural effects on lysozyme 
binding are small. In the D1.3 complex structure, residue H32 of 
CDR-H1 makes a weak (3.5 A) direct contact with lysozyme. Res- 
idues H30 and H31 make contact via water molecules (13). In the 
HuLys H27S structure, the distance for the potential direct contact 
between H32 and lysozyme is 4.1 A (molecule 1) or 4.3 A (mol- 
ecule 2), similar to the 4.0-A contact seen in the H27F molecule 1 
complex and an increase from the 3.4 A in the H27F molecule 2 
complex, and too large to be important in lysozyme binding (28). 
Due to the resolution of x-ray data for the HuLys complexes, we 
have not modeled water molecules, hence we cannot directly com- 
pare the Fv-lysozyme interactions involving residues H30 and H3 1 
to the corresponding interactions in D1.3. However, we did com- 
pare the positions of the Fv atoms in H27S and H27F involved in 
these contacts, the tarbonyl oxygen atoms of H30 and H31. Both 
these atoms in H27S molecule 1 have moved 0.8 A from their 
positions in H27F. In H27S molecule 2. the backbone Ca atoms of 
these residues have moved l.9A(H30)and 1.2 A (H31) from their 
positions in the H27F complex. The atoms actually forming the 
contacts, H30 O and H31 O, have moved 1.7 A and 0.9 A, re- 
spectively. The size of this shift does not necessarily mean that 
these contacts are broken. The water molecules in the H27S 
complex presumably could shift position to accommodate the new 
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Table III. Rms changes in Ca position of H71 mutants relative to 
H71K 



RMS Deviation (A) 

Mutant Molecule 1 Molecule 2 

H71V 0.3 0.3 
H71A 0.2 0.2 
H71R 0.1 0.1 



positions of the protein atoms. The remainder of CDR-H 1 in H27S 
is offset from its location in H27F, with the respective chains back 
in register by residue H35, the last residue in the CDR. 

H71 structures 

The size of the side chain at position H71 is thought to control the 
relative disposition of loops forming CDR-H I and CDR-H2 (29). 
Previously published structures of free HuLys Fv and the HuLys- 
lysozyme complex had Lys in this position. Here we report addi- 
tional structures with Val, Ala, and Arg at residue H71. All three 
forms crystallized and were determined as an Fv-lysozyme com- 
plex, and a structure of the free H71V Fv was obtained as well. 

All the Fv-lysozyme complexes were virtually identical. Super- 
position of the Ca atoms of the mutant complexes onto the H71K 
complex gave small rms differences of 0.3 A or less, as presented 
in Table III. Twelve Ca atoms in the two H71V molecules have 
shifts greater than twice the rms differences, and none are near the 
combining site. Comparing the structures of the H71A and H71K 
complexes, four Ca atoms have shifts greater than twice the rms 
difference; three are in the L chain and one is in lysozyme. All are 
remote from the combining site. The most conservative H7I sub- 
stitution, arginine for lysine, gave the smallest overall rms differ- 
ence. However, as for the other H71 mutants, there were moderate 
shifts of the mutated residue and residues in the nearby segment of 
polypeptide chain. The Ca atoms of H71 in molecules 1 and 2 
moved 0.5 A and 0.3 A, respectively, and the preceding Ca atoms 
in molecule 1, H69 and H70, moved 0.2 A and 0.4 A. All other 
shifts greater than twice the rms distance occurred distant from 
H71 and from the combining site. Fig. 3 shows superposition of 
H71 and parts of CDR-H 1 and CDR-H2 for the four molecules, 
taken from the complexed crystal forms. This illustration shows 
clearly that there is no change in structure of the two CDRs, de- 
spite the mutations at H71. 




FIGURE 3. Structure of residue H71 and first and second hypervariable 
loops in four lysozyme-Fv complexes. Conformations of these residues in 
the two crystallographicalfy independent asymmetric units of all four mu- 
tants are essentially identical. This illustration is a composite of superposed 
molecule 2s seen in H71K. (black line), H71V (gray line), H71A (dotted 
line), and H71R (dashed line). Superpositions were based on H7IK mol- 
ecule 2 and used the Ca atomic coordinates of the residues shown. 
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FIGURE 4. Structures of residue H71 and first and second hypervari- 
able loops in unliganded Fv molecules. Top, H71V molecule (gray line) 1 
superposed on H71K molecule I (black line). Middle, H71V molecule 2 
(gray) superposed on H7 IK molecule 2 (black). Bottom, H71K molecule 2 
(gray) superposed on H71K molecule 1 (black). 



The structures of uncomplexed H71V and H71K offer another 
opportunity to test for a mutation-induced conformation change 
following the Tramontano model. The two unliganded crystal 
forms of H71K and H71V each have two molecules in the asym- 
metric unit, hence comprise a total of four independent Fv struc- 
tures. Molecule 1 of H71K and molecule 1 of H71V superpose 
almost exactly in the region of the mutation, as evident in Fig. 4, 
top. Molecule 2 of H71K and molecule 2 of H71V superpose sim- 
ilarly well (Fig. 4, middle). However, these two pairs represent 
distinct conformations. The two independent molecules of H71K 
do not superpose well (Fig. 4, bottom), and the same is true for 
molecule 1 and molecule 2 of H71V. In other words, two Fvs 
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differing at residue H71, but in identical crystal packing environ- 
ments, are closer in conformation than two with the same se- 
quence, but in different environments. The two unliganded con- 
formations observed presumably are distinct because of crystal 
packing interactions, rather than amino acid sequence differences 
at residue 71. The conformation of H71 and the two loops in the 
H71K and H71V Fv-lysozyme complexes is intermediate between 
the two conformations in the unliganded structures, though closer 
to molecule 2 (rms difference 0.2 A for the Cot atoms in the illus- 
tration superposed on molecule 2 of the H71K complex) than to 
molecule 1 (0.4 A rms difference). 

Discussion 

The role of residues H26-H30 in Ag binding by humanized Abs 
has been ambiguous. This segment is not considered part of "Ra- 
bat" CDR-H1 (residues H31-H35), and these residues rarely con- 
tact Ag (30). Other homology- and structure-based definitions of 
the first Ig H chain CDR have similarly designated residues outside 
this segment, viz H31-H37 (31), H31-H32 (32), and H30-H35 
(33). One exception is the canonical H chain hypervariabie loop I 
proposed by Chothia and Lesk (34), extending from H26 to H32. 
The rationale for this assignment was that the segment forms a 
loop connecting two /3-strands of rather standard geometry. The 
conservation of particular features in the N-terminal portion of the 
segment, such as an invariant Gly at position H26, was considered 
critical for maintaining the backbone conformation of the Ag-con- 
tacting C-terminal portion of the H26-H32 loop. 

How are conformational changes in CDR-H 1 transmitted from 
the H26-H30 region? Comparison of side-by-side crystal struc- 
tures of mouse and humanized versions of the same Ab would 
seem a straightforward way to discover this mechanism, as iden- 
tical CDR-H 1 sequences are abutted in the two cases to H26-H30 
regions of separate murine and human origin. However, existing 
structural data on humanized Abs have been equivocal. 

The canonical H26-H32 structure, which the vast majority of 
Abs adopt (35), is typified by the human H chain NEWM (14. 34). 
The rat anti-CD52 Ab CAMPATH-IG, with H26-H30 sequence 
GFTFT, follows this canonical structure precisely (36). The initial 
humanized form, though based on NEWM frameworks, sequence 
GSTFS, bound Ag poorly, and probably did not adopt a canonical 
conformation. The crystallographically studied humanized form. 
CAMPATH-1H. had higher affinity by virtue of the H26-H30 re- 
gion being reverted to the rat sequence. Nevertheless, this structure 
still differed from the canonical conformation at residues H29 and 
H30. This deviation was attributed to a different interaction with 
respective side chains at position H7 1 (Arg in CAMPATH- 1 G, Val 
in CAMPATH- 1H). A recent structure of the same humanized 
molecule in complex with an Ag mimotope showed that the H26- 
H32 loop was once again in the canonical conformation (37). 

CDR-H 1 of the murine anti-y-IFN Ab AF2 deviates in confor- 
mation from the canonical structure at each position, but is still 
topologically recognizable as a loop (38). In contrast, the human- 
ized version of AF2, despite having an identical sequence from 
residue H 19-H66, has an a-helical CDR-H 1 not seen in any other 
Ab structure. This unique conformation was attributed to a second 
structural rearrangement in framework I associated with a Pro 
(mouse) to Ser (humanized) mutation at position H7. 

The murine anti-lysozyme Ab D1.3 has a canonical CDR-H I 
structure (11). The humanized version of DI .3 whose structure we 
previously reported (16, 17) has an identical sequence from H26- 
H35 (7) (H26-H30 sequence GFSLT) and also adopts a canonical 
CDR-H 1 conformation. A kinetic study of HuLys mutants showed 
that a Ser substitution at residue H27 had only a slightly detrimen- 
tal effect on Ag affinity (9). This observation was contrary to the 
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profound effect of a Ser-to-Phe mutation in CAMPATH- 1H, even 
though both HuLys and CAMPATH- 1H used NEWM framework 
sequences (4). One possible explanation is that the mutation in 
HuLys caused no significant structural change. The finding that 
CDR-H 1 of D 1.3 contributes little free energy toward lysozyme 
binding (39) makes plausible an alternative possibility, that the 
mutation did cause a change in residues H26-H30, but this per- 
turbation was not detectable by kinetic analysis. Crystallographic 
data presented here favor the latter proposition, made clear in Fig. 
I. The HuLys H27S structure shows large changes in backbone 
conformation in residues H22-H30 in molecule 1 and H26-H30 in 
molecule 2, but these torsional changes are not transmitted to the 
nearby Ag binding residues H31 and H32. Translational changes 
are also not transmitted to these residues, except for a displacement 
of H31 in molecule 2. Given our findings and the apparent idio- 
syncrasies observed in other humanized Ab structures, we can only 
conclude that the conformation of CDR-H 1 and the adjacent H26- 
H30 region are extremely sensitive both to their own sequences 
and to interactions with adjacent residues. Our understanding of 
structural determinants of H26-H35 and our ability to rationally 
manipulate this region remain limited. 

Tramontano et al. (29) have articulated a descriptive and pre- 
dictive model for the structures of the H chain hypervariabie loops 
I (Rabat residues H26-H32) and 2 (Rabat residues H52a/53- 
H55). In this model, the most important determinants of the con- 
formation of hypervariabie loop 2 are the length of the loop and 
specific sequence constraints, with particular canonical structures 
and conserved residues expected for 3, 4, and 6 residue loops. A 
further structural determinant is the side chain of residue H71, 
which is significant in the following way. The position of hyper- 
variabie loop I is essentially fixed. The position of loop 2 relative 
to loop I is variable, and depends on whether a large side chain at 
H7I packs between the two loops and separates them or a small 
side chain at H71 allows loops 1 and 2 to juxtapose. 

In HuLys crystal structures with four different side chains at 
residue H71, the expected conformational rearrangement of the 
hypervariabie loop 2 region is not observed. The absence of a 
mutation-induced conformation change cannot simply be due to 
the stabilizing effect of a bound Ag, because the Lys-to-Val mu- 
tation in the unliganded crystal forms also does not alter the po- 
sition of loop 2. The modest (0.4-0.6 kcal/mol) improvement in 
affinity that accompanied this mutation thus cannot be attributed to 
relieving an inappropriate displacement of hypervariabie loop 2 
(9). Our findings do not invalidate the Tramontano model, for 
which other proof exists, including a specific mutational study of 
residue H7I in the crystallographically determined Ab B72.3 (40). 
Our data do demonstrate that a class of exceptions may exist in 
which the H71 side chain alone does not affect the separation of 
hypervariabie loops 1 and 2. An unknown sequence determinant 
may override the action of H71, or the compact nature of 3-residue 
hypervariabie loops (H53-H55) may confer less sensitivity to the 
bulk of the H71 side chain. 

The observation that significant conformational changes in the 
H27S mutant did not lead to much change in Ag affinity, whereas 
substitutions at H7I gave affinity differences, but no apparent ex- 
plicatory change in structure illustrates the value of combining 
structural and kinetic studies. 
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Summary 

During a germinal center reaction, random mutations are introduced into immunoglobulin V 
genes to increase the affinity of antibody molecules and to further diversify the B cell reper- 
toire. Antigen-directed selection of B cell clones that generate high affinity surface Ig results in 
the affinity maturation of the antibody response. The mutations of Ig genes are typically base- 
pair substitutions, although DNA insertions and deletions have been reported to occur at a low 
frequency. In this study, we describe five insertion and four deletion events in otherwise so- 
matically mutated V H gene cDNA molecules. Two of these insertions and all four deletions 
were obtained through the sequencing of 395 cDNA clones (~1 10,000 nucleotides) from 
CD38 + IgD~ germinal center, and CD38~IgD" memory B cell populations from a single hu- 
man tonsil. No germline genes that could have encoded these six cDNA clones were found af- 
ter an extensive characterization of the genomic V H 4 repertoire of the tonsil donor. These six 
insertions or deletions and three additional insertion events isolated from other sources oc- 
curred as triplets or multiples thereof, leaving the transcripts in frame. Additionally, 8 of 9 of 
these events occurred in the CDR1 or CDR2, following a pattern consistent with selection, 
and making it unlikely that these events were artifacts of the experimental system. The lack of 
similar instances in unmutated IgD + CD38~ follicular mantle cDNA clones statistically associ- 
ates these events to the somatic hypermutation process (P = 0.014). Close scrutiny of the 9 in- 
sertion/deletion events reported here, and of 25 additional insertions or deletions collected from 
the literature, suggest that secondary structural elements in the DNA sequences capable of pro- 
ducing loop intermediates may be a prerequisite in most instances. Furthermore, these events 
most frequently invoke sequence motifs resembling known intrinsic hotspots of somatic hy- 
permutation. These insertion/deletion events are consistent with models of somatic hypermu- 
tation involving an unstable polymerase enzyme complex lacking proofreading capabilities, and 
suggest a downregulation or alteration of DNA repair at the V locus during the hypermutation 
process. 



During the course of a T cell-dependent antibody re- 
sponse, B cells hone the specificity of their antibody 
molecules through a process of random somatic hypermu- 
tation of their V genes, followed by antigen driven selec- 
tion. This is collectively referred to as affinity maturation. 
This process occurs within the germinal centers (GCs) 1 of 
secondary follicles from peripheral lymphoid organs when 



antigen stimulated B cells receive proper signals from T and 
accessory cells. In the human system, GC B cells are char- 
acterized by the surface expression of CD38 and, in most 
cases, the loss of IgD (1-3). We have previously shown that 
the initiation of somatic hypermutation occurs within the 
CD77 + subset of these IgD~CD38 + B cells (4). Mutated V 
genes can be isolated from all subsequent stages of B cell 
differentiation and in cells from all IgD' and certain IgD + 
B cell subsets (4, 5). The molecular process of somatic hy- 
permutation remains elusive, primarily due to the lack of a 
good in vitro model until very recently (6). Much of what 
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EXHIBIT C 



is known concerns: (a) localizing the somatic hypermuta- 
tion process to particular B cell subsets and anatomical set- 
tings (4, 7-10); (b) delineating the limits and rates of muta- 
tional activity (11); (c) determining the minimal substrate 
through transgenic technology (12, 13); and (d) analyzing 
the mutations themselves in the context of the surrounding 
sequence to reveal tendencies such as strand polarity and 
"hotspots" of somatic hypermutation (for reviews see refer- 
ences 12 and 13). 

Although somatic hypermutation is typically described as 
the generation of bp substitutions, insertions and deletions 
have been sporadically described. As with somatic point 
mutations, the analysis of these events can provide valuable 
information concerning somatic hypermutation itself. Anal- 
ysis of human V H 4 family genes generated from the ampli- 
fication of cDNA from somatically mutated GC (IgD" 
CD38 + ) and memory (IgD-CD38") B cell subpopulations 
led us to identify a number of cDNA clones from the mu- 
tated cell populations that contained insertions and dele- 
tions. We provide evidence that these events are linked to 
the somatic hypermutation process. Additionally, these 
events occur in a predictable fashion relative to the sur- 
rounding sequence, suggesting a model for their occur- 
rence with implications for the molecular process of so- 
matic hypermutation. 



Materials and Methods 

Isolation. Labeling, and Sorting of Tonsil B Cells. Human tonsils 
were obtained during routine tonsillectomy. B cell isolation and 
sorting for CD38 and IgD expression were performed as previ- 
ously described (4, 14). In brief, human tonsillar B cells were sep- 
arated into IgD + CD38~ follicular mantle (FM) B cells, IgD" 
CD38 + GC B cells, and IgD~CD38- memory B cells to 95-98% 
purity as predicted by FACS® analysis, as previously described (13). 
The mutation state of the V H gene cDNA clones from the various 
subpopulations was in agreement with our previous study (4). 
Clones were considered somatically mutated if they contained two 
or more bp substitutions, well beyond the expected error rates for 
the avian myeloblastosis virus reverse transcriptase (AMV-RT), 
Taq, and PFU polymerases used in these analyses (this mutation 
rate is based on our previous analyses; reference 4). 

Sequencing the Ig V H Transcripts. Total RNA was extracted 
from 1-5 X 10 5 B cells using guanidinium thiocyanate-phenol- 
chloroform in a single step using the Ultraspec RNA isolation 
system (BIOTECX Laboratories, Houston, TX). and was reverse 
transcribed using oligo-d(T) or specific V gene constant region 

oligonucleotides Cu.12 (5'-CTGGACTTTGCACACCAC- 

GTG 3') for IgM transcripts or C7I8O (5' -CTGCTGAGG- 
GAGTAGAGTCC-3) for IgG transcripts and SuperScript II reverse 
transcriptase (GIBCO BRL, Gaithersburg, MD). First strand cDNA 
was used directly for second strand synthesis and amplification via 
PCR using internal primers corresponding to the Cu, or C7 con- 
stant regions in combination with V H 4 or V H 6 family-specific 
leader oligonucleotides: C7HO, 5'-GGCAAGGTGTGCACGCC- 
GCTG-3'; Cu.10, 5'-TCTGTGCC CTGCATGACGTC-3' ; L-4, 
5 ' - ATG AAAC AC CTGTG GTTCTT- 3 ' ; L-6, 5'-ATGTCTGT- 

CTCCTTCCTCAT-3'. The PCR products were purified using 
microconcentrators (Amicon, Beverly, MA), and then were ki- 
nased and blunt-end ligated into an EcoRV-digested and dephos- 



phorylated pBluescript plasmid (Stratagene, La Jolla, CA; Polynu- 
cleotide Kinase, T4 DNA Ligase, and EcoRV were from Boehringer 
Mannheim, Amsterdam, Netherlands). After transformation by 
electroporation into electro-competent DHlOa Escherichia coli 
(GIBCO BRL) and screening with consensus internal oligonu- 
cleotides as previously described (4, 15), positive colonies were 
picked, plasmid mini-preparations were made, and colonies were 
sequenced in both directions using an automated DNA sequencer 
and automated sequencer protocol (ABI-377; Advanced Biotech- 
nologies Inc., Columbia, MD). All sequences were analyzed us- 
ing DNAstar (DNAstar Inc., Madison, WI). In the first tonsil an- 
alyzed, 583 clones were picked, plasmid mini-preparations were 
made, and Southern blots were prepared by standard methods. 
These blots were screened with a set of oligonucleotides specific 
for the various V H 4 family genes. Only those clones that screened 
positive with constant region probes but negative for the various 
V H 4 complementarity-determining region (CDR) 1-specific probes 
were sequenced (395 of 583 clones), thus enriching the somati- 
cally mutated populations analyzed, in that the CDR1 probes 
should anneal only to the sequences most similar to germline. 
The frequency of die occurrence of these events can therefore 
only be predicted to be between 6 out of 395 and 6 out of 583 
clones (1-2%). Any sequence of interest was resequenced in both 
directions to ensure sequence fidelity. 

Characterizing the Genomic Repenolre. Total genomic DNA was 
isolated from FM B cells (IgD + , CD38") using the Ptiregene DNA 
isolation kit (Gentra Systems, Inc., Minneapolis, MN). V H 4 genes 
were amplified using a V H 4 leader-specific primer (L-4, as above) 
and a primer specific for all Vh4 gene family heptamer-nonamer 
spacer regions as previously described (16). PCR products were aga- 
rose gel purified, then cloned into E. coli as described above for 
the cDNA clones. Clones identified in the cDNA analysis that con- 
tained insertions or deletions were used to design PCR primers to 
amplify both the exact sequence of clones with insertions/dele- 
tions as found and the predicted sequences based on the proposed 
germline counterparts. Oligonucleotides used in this analysis 
(Formal, is as follows: clone: exact/predicted): gG4:5' GGACGG 

G TT G T A C T T G GTTCC-375' - G G A C G G G T T G T A G G T C - 
TCC-3'; gl44:5'-TCTTGAGGGACGGGTTGGTGT-375'- 
TCTTGAGGGACGGGTTGT-3'; t> 187:5' -CAGCTCCAGTAG- 
TAAGCCCCG-375'-CAGCTCCAGTAGTAACCACCG; g 188: 
5 ' - G A G G G ATT G T A G TT G G AGCC-375' - G A G G G G TT G T 
AGTTGGTCCC; g 192:5'-CCAGCCCCAGTAGTAGTAACT- 
37(same); and g 80:5'-GCGGATCCAATACCTCACACT-37 
5'-GCGGATCCAGTAGTAACC-3'. 

Sequence Availability. All cDNA sequences with insertions or 
deletions, and any genomic sequences unique to the literature as de- 
scribed in the results section are available from EMBL/Genbank/ 
DDBJ under accession numbers AF013615 through AF013626. 

Assay tor Screening V H Gene Lengths. To facilitate the analysis 
of large numbers of V H gene transcripts for the presence of inser- 
tions or deletions, first strand cDNA produced as described above 
was PCR amplified using Expand high fidelity polymerase (Boeh- 
ringer Mannheim) to reduce errors resulting from Taq poly- 
merase alone. The products of this PCR amplification were cloned 
as described above and screened using P labeled, gene-specific oli- 
gonucleotides (V H 4-39:5'-ATTGGGAGTATCTATTATAGT-3'; 
L-6 as above). Positive colonies were picked and used to inocu- 
late overnight cultures. A 1 u.1 aliquot from each 24-h culture was 
used to directly inoculate 25-u.l PCR amplification mixtures in 
96-well-format PCRs. The internal PCR reactions used 32 P-labeled, 
gene-specific oligonucleotides to amplify a 230-base fragment in- 
cluding the V H 4-39 CDR1 (L-4, as above, and V H 4-39-3': 5'- 
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A) Insertion events from a single tonsil 

II SSSSYYWGW ROPPGKGLEUICsTyySGSTVV' 



B) Deletion events from a single tonsil: 



KNQFSLXLSSVTAADTAVYYCA 



SV0I,QESCPCLVKP£0TI,S1.TCTVSGGSISSGGYYWSWIWJHPGKG1.EWIGYI 



4-J) QVQLQvWGAT.l VI ET 11 J,'F rrr«i»llJPK 

C) Other insertions: 

V,6 VQLQQ8GPGLVKPSQTLSLTCAIS< 



TISVOTSKNQFSLKLSSVTAADTAVY 



; (A) anc 
ns (B) fh 



singlet 



h de- 
ll. (Q 



cDNA c 
with insertions isolated from vari- 
ous sources. Sequence data avail- 
able from GenBank/DDBJ under 
accession numbers AF013615 
through AF013626- 



GCTCCCACTATAATAGATACT-3') or for analysis of V„6 
genes a 166-nucleotide fragment including the CDR1 and CDR2 
of V H 6 (V H 6FW1: 5'-TGCCATCTCCGGGGACAGTGT-3', 
V H 6FW3: 5'-TGTGTCTGGGTTGATGGTTAT-3'). Aliquots 
of each clone were also used to inoculate amplifications of CDR3 
regions using FW3-specific (ssFW3: 5'-CTGAA[C/G]CT- 
GAGCTCTGTGAC (TV C]) and Cp.- or Ov-specific oligonucle- 
otides (C|xD: 5'-GGAATTCTCACAGGAGACGA-3', Oy-140 
as above) to analyze the diversity of the populations under study; 
the distribution of CDR3 size variations of several hundred V H 
sequences cloned in this analysis were used to produce an ex- 
pected distribution of CDR3 sizes for comparison (see Fig. 5 B). 
The amplification products were electrophoresed on 0.6X-TBE. 
5% urea-acrylamide sequencing gels (Long Ranger: J.T. Baker. 
Phillipsburg, NJ) and analyzed with a Phosphorlmager (Molecu- 
lar Dynamics, Sunnyvale, CA.) using tin- Image Quant soltwaic 
supplied by the manufacturer. Clones that differed from the ex- 
pected size and those clones in lanes adjacent to aberrantly mi- 
grated bands were used to produce plasmid preparations from 
which the inserts were sequenced in either direction. 
Scoring of Insertion/ Deletion Even ts . I 



s pei 



\0* r 



within the customary boundaries of CDR1 and CDR2. This unit 
was chosen because in the selected populations studied these events 
are generally only found in the CDR regions and therefore the 
comparison of events per total nucleotides would be misleading 
In the PAGE analysis, each V H 4-39 FM clone included only the 
CDR1 (21 nucleotides) within a total of 230 nucleotides/clone, 
whereas each V H 6 FM clone was only 166 nucleotides but in- 
cluded both the CDR1 and CDR2 (75 CDR nucleotides). In the 
sequencing analysis, various B cell populations were analyzed in- 
volving a wide range of overall lengths. Comparisons of the fre- 
quency of insertions/deletions just within the CDRs allowed for 
a more standardized and quantitative analysis, and for more free- 
dom in experimental design. 

Baculovirus Expression System. Cloning and coexpression of 
clone pg86 and K light chain FS6k in the baculovirus expression 



system was performed as previously described (17). Recombinant 
Autographa California nuclear polyhedrosis virus (AcMNPV) was 
cloned using the pH360NX transfer vector and expressed in Sf9 
cells. 

Capture ELISA for y Heavy Chain, and K Light Chains. Expres- 
sion of recombinant antibodies of clone pg86 coexpressed with K 
light chain FS6k were measured by capture ELISA. Wells were 
coated with goat anti-human IgG and incubated with supernatant 
of recombinant pg86/FS6K added in serial twofold dilutions. 
Bound antibody was detected using alkaline phosphatase-conju- 
gated goat anti-human IgG, or goat anti-human Ck. After 1-h 
incubation at 37°C, phosphatase substrate was added and absor- 
bance was measured at 405 nm in an ELISA plate reader. 



Results 

Insertions and Deletions into Immunoglobulin V H Genes. 
In a large scale analysis of V H genes from both the IgM and 
IgG compartments of B cell subpopulations separated from 
a single human tonsil, six clones that contained DNA inser- 
tions or deletions were isolated. These insertions and dele- 
tions were apparently selected in that they involved nucle- 
otide triplets or multiples of nucleotide triplets, leaving the 
cDNAs (transcripts) in frame, and they were localized to the 
CDR1 and CDR2 (Fig. 1, A and B) . The six clones with 
insertions or deletions were identified from the sequencing 
of 395 cDNA clones (-1 10.000 nucleotides) from GC and 
memory B cell subpopulations, resulting in a frequency of 
<2% of clones analyzed (-1 event/18,000 nucleotides). 
All six events were in IgG transcripts. Two events were ob- 
tained from IgD~CD38 + GC and four events from 
IgD~CD38~ memory cell populations. None of the IgM V H 
cDNAs analyzed from this tonsil had insertions or deletions, 
although we have observed such events in IgM transcripts 
in the past and in subsequent analyses, as described below. 
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Table 1. cD/VA and Germline Clones Isolated 



onMatsudaandllonjo(37) 

Figure 2. Comparison of the CDRls of the human V H 4 germline 
genes. The primary variability between V H 4 family members is 3-6-bp 
size variances in the CDRls which is similar to the short insertions and 
deletions I hat we attribute to somatic hvpei niuldt ion in the selected B cell 
populations studied in this report. 



The Insertions and Deletions Are Not Germline Encoded. 
The analysis described above focused on the V^4 gene 
family, which consists of 10-14 members/genome, varying 
slightly between individuals (16, 18). As shown in Fig. 2, 
the major difference between V H 4 genes involves the length 
of CDR1. Because genomic diversity between V H 4 family 
members resembles the events described in this paper we 
had to rule out possible alternative explanations for these 
events, such as: (a) different alleles of the detected genes; (t>) 
rarely expressed or otherwise unknown V H 4 gene family 
members; or (c) hybrids between known and detected V H 
genes and/or other artifacts of the experimental system. To 
address these issues, both the expressed and genomic reper- 
toires from this tonsil were characterized. As indicated in 
Table 1, 2 out of 118 V H 4-39. 2 out of 49 V H 4-31, 1 out of 
87 V H 4-34, and 1 out of 45 V H 4-59 cDNA clones contained 
insertion/deletion events. cDNA clones were judged as 
unique isolates based on CDR3 analysis, and the few iso- 
lates that appeared to be clonally related differed in their 
patterns of somatic mutation beyond the level explainable 
by reverse transcription and PCR errors (maximum: >1 
mutation/500 nucleotides of V H gene sequence as previ- 
ously described [4]). 

To characterize the genomic repertoire of the initial ton- 
sil, 80 germline V H 4 gene clones were isolated and se- 
quenced (Table 1), which encompassed all 14 V H 4 family 
members or alternate alleles represented in the 446 cDNA 
clones analyzed from all of the tonsillar B cell subsets. In 
the course of this study, we isolated the germline counter- 
part of a novel V H 4 gene segment for which transcripts had 
been found. In addition, germline genes corresponding to 
two apparently functional V H 4 genes not found as cDNA 
clones in this analysis were isolated, as well as one nonfunc- 
tional V H 4 gene and a divergent polymorphism of a known 
V H 4 pseudogene. The proposed germline counterparts of 
each of the V H 4 genes containing insertion/deletion events 
were isolated from 4 to 1 1 times (Table 1). 8 independent 
genomic isolates of V H 4-31 and of V H 4-39 were cloned. 
V H 4-34 and V H 4-59 were isolated 1 1 and 4 times, respec- 
tively. No germline genes were isolated that could have en- 
coded the insertion/deletion events described. 

To further be certain that the insertion/deletion events 
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described herein were not germline encoded, two sets of 
PCR primers were designed to specifically recognize: (a) 
the exact sequence of the events; (b) the predicted, unmu- 
tated, germline sequence corresponding to the cDNAs con- 
taining insertion and deletion events. These primers were 
used to amplify genomic DNA from this individual, yield- 
ing negative results (data not shown). The unique nature of 
these events relative to both the expressed and genomic 
repertoire and our inability to amplify genomic counter- 
parts for these events by PCR suggest that they are not germ- 
line encoded. 

The Proposed Insertion/Deletion Events Are Not the Result of 
(Vtf/Vt-i) Recombination. As in most V gene repertoire 
analyses, we detected hybrid V H sequences that could be 
the result of either PCR splicing by overlap extension arti- 
facts, or reciprocal homologous recombination between un- 
rearranged V genes (19). However, none of these likely arti- 
factual events were altered in size such that they resembled 
the insertion or deletion of DNA described above. A num- 
ber of artifacts of this type had been isolated in the cDNA 
analysis as well; such artifacts are common to V gene analy- 
ses (20). The cDNA isolates with deletion and insertion 
events were stringently compared to all germline and 
cDNA isolates and were found to be unique relative to both 
the expressed and germline V H 4 gene repertoires of this in- 
dividual, supporting a somatic origin for their occurrence. 

The Insertions and Deletions Are Associated with Somatic Hy- 
permutation. To determine whether or not these inser- 
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tion/deletion events were associated with somatic hyper- 
mutation, we analyzed their occurrence in unmutated FM 
transcripts. This was done using either direct sequencing or 
PCR amplification of portions of the V H genes spanning 
the CDRs, followed by size comparisons on polyacryl- 
amide gels (Fig. 3). Any clones that ran aberrantly, and the 
clones in adjacent lanes, were sequenced (75 out of the 485 
clones). None of these 75 clones were related based on 
CDR3 homology. To ensure that the remaining 410 FM 
clones were polyclonal, the CDR3s were PCR amplified 
and loaded on the sequencing gels simultaneously to the 
V H gene amplification products for size comparisons (Fig. 
3 A). The size distribution of these CDR3s was similar to 
that of ~500 V' H gene sequences analyzed in this study 
(Fig. 3 B). providing evidence that our FM sample is poly- 

The six events detected from a single tonsil were isolated 
from 395 mutated cDNA clones (25,482 CDR nucleo- 
tides), corresponding to a frequency of 2.35 events/10 4 CDR 
nucleotides. This is significantly different (p = 0.014 by a 
one-sided x 2 test) from the analysis of unmutated FM- 
derived clones (25,515 CDR nucleotides) that yielded no g> 
insertions or deletions (Table 2). f 

In the course of the analysis described above, we isolated § 
one IgM clone containing a 6-nucleotide insertion into £ 
framework (FW)3 (see below). We believe that this clone g" 
is part of the mutated GC or memory repertoire because it 
contained 4 bp substitutions in addition to the insertion. In § 
this study, the B cell populations analyzed were 95-98% |- 
pure, and the FM B cell subpopulation could therefore in- 3 
elude between 2 and 5% contaminating clones, that is, cS 
IgM-expressing ceils not from the naive population that § 
can therefore be somatically mutated. However, none of jjf 
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"Clones analyzed by hot-PC R/PAGE assay as described in the text. 

! CDR nucleotides are those within the customary bounds of the CDR1 and CDR2. (See Materials and Methods for a more detailed explanaUon of 
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otides/10 4 ) = 2.35. 
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A Insertion ewnts: 



: TAC TGG GGC TGG 

: CAG TTC 



C Long duplication/insertion: 




Figure 5. ELISA assays si 



the FW1/CDR1 
Clone pg86 (lgC he; 

chain FS6k in insect 



ELlSAs. Wells were coated with 
goat anti-human IgG. Superna- 



twofold dilutions. Bound antibody 
was detected with phosphatase- 
conjugated goat anti-human IgG 
(A), and goat anti-human Ck (B). 



Figure 4. The insertions and de 
DNA sequence. {A) The insertions 
.idjacent sequciK i- ij'} fit- tit -h-i n > 



the unmutated FM clones analyzed had insertions or dele- 
tions. 

Other Insertions and Deletions into V H Genes. We have ob- 
served similar instances of insertions and deletions into the 
coding regions of apparently functional immunoglobulin V 
genes, including: (a) a V H 6 IgM isolate containing a triplet 
duplication/insertion into the CDR1 in addition to several 
bp substitutions (Figs. 1 C and 4 A), which was derived 
from a human hybridoma secreting high affinity mAb 
against Bordetella pertussis (21, 22); (b) a 6-nucleotide inser- 
tion into the FW3 region of a mutated IgM V H 6 gene 
resenting the only insertion or deletion observed outsi 
the CDRs (Figs. 1 Cand 4 A, clone tml21); and (f) a 
nucleotide duplication/insertion into a human plasm 
cDNA transcript at the boundary between the FW] and 
CDR1 (Figs. 1 Cand 4 Q, doubling the length of this hy- 
pervariable loop. The viability of clone pg86 was tested by 
expressing it in the baculovirus system in association with a 
k light chain encoding construct (FS-6k; Fig. 5). The effi- 
cient expression, secretion, and pairing with light chain in 
the baculovirus system suggest that the product of clone 
pg86 is a functional heavy chain despite the large duplica- 

The Insertions and Deletions Are Related to the Surrounding 
Sequence. As shown in Fig. 4, the insertions reported are 
duplications of the immediately adjacent sequence, and the 



deletions involve elements of repetitive tracts. In addition, 
a higher incidence of these events involve sequence motifs 
that resemble intrinsic hotspots of somatic hypermutation 
(12, 23-27): (a) four of eight events involved the serine 
codon AGC that has been reported as the "hottest" of 
hotspots (24-27) (Fig. 4, sequences HBp2, gl87, gl88, and 
g86); (b) two events involved TAC motifs (Fig. 4, gl92 and 
g64); and (c) two events involved the motif AAC (Fig. 1. 
g] 44, and tm]2l). In general all of the clones found to 
contain insertions and deletions were highly mutated (Fig. 
1). Several of these clones had bp substitutions clustered 
with the insertions or deletions (Figs. 1 and 4). The plasma 
cell transcript depicted in Fig. 4 C contained an ]8-nucle- 
otide insertion that duplicated the 5' adjacent sequence. 
The central nine nucleotides of the duplicated sequence 
form a partial palindrome (..GGtGaCtCC). This clone 
was mutated (G to A at position 80 and an A to T at posi- 
tion 85) before the duplication/insertion event, as these 
e perpetuated in the inserted sequence. 



Discussion 

Somatic modification of V genes encoding immunoglob- 
ulin and T cell receptors recapitulates most mechanisms 
observed in the evolutionary diversification of DNA: (a) V 
gene recombination, including imprecise junctions, P nu- 
cleotides, and untemplated N nucleotide addition; {b) gene 
conversion; and (c) bp substitutions in Ig somatic hypermu- 
tation. The insertion and deletion of nucleotides is another 
means for the evolutionary diversification of DNA, and has 
been proposed as an explanation for unusual V gene se- 
quences in the past (Table 3). In this study, we show that 
insertions and deletions a 
permutation process. 



e associated with the somatic hy- 
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Complexities of the Analysis of Insertions and Deletions into V 
Genes. The formal characterization of these events has 
been a daunting task because of their low frequency, and the 
complexity of the germline Vh repertoire. According to 
our study, these events occur in <2% of somatically mu- 
tated clones. As shown in Fig. 2, the primary variability be- 
tween Vj_j 4 family members is 3-6-bp size variances in the 
CDRls, which is comparable to the short insertions and 
deletions that we attribute to somatic hypermutation (in se- 
lected B cell populations). The similarity between evolu- 
tionary diversity and somatic diversification was expected, as 
the molecules are likely subject to the same functional and 
structural constraints. This has made it difficult to determine 
whether these events were generated somatically, versus germ- 
line encoded, or if they were artifacts of the experimental 
system: they could result from homologous recombination 
between alternate alleles or imperfect recombination be- 
tween identical alleles, or they could have occurred during 
B cell replication independent of somatic hypermutation. 
In fact, V H genes may exhibit particularly unstable se- 
quence characteristics evolved to help support both germ- 
line diversity and the generation of somatic mutations, as 
suggested by the identification of intrinsic hotspots of so- 
matic hypermutation within the CDRs of V genes (25, 26). 
Perhaps the area of greatest contention in this complex sys- 
tem remains the possibility that these low frequency events 
are artifacts of the experimental manipulations performed, 
the AMV-RT, Taq, or PFU polymerases, and/or the clon- 

The Insertion/Deletion Events Are The Result of the Somatic 
Hypermutation Process. Our system addresses several key 
issues that associate the occurrence of insertions and dele- 
tions to the somatic hypermutation process, (a) Six of the 
nine insertions/deletions were identified within the V H 4 
gene repertoire of a single tonsil, providing an experimen- 
tal system that could be characterized extensively as de- 
scribed below, (b) All of the insertion/deletion events re- 
ported involved triplets or multiples of triplets, leaving the 
transcripts in frame and therefore functional, and eight of 
nine events reported were localized to the CDRs. As with 
somatic point mutations, no insertions or deletions were 
observed in the 80 to 120 nucleotides of constant region 
(C|x or C7) DNA sequenced with each cDNA clone. 
These hallmarks of somatic hypermutation and selection ar- 
gue strongly that these events are not artifacts, (c) The B 
cells analyzed were processed and separated into highly 
pure, mutated B cell populations including GC (IgD" 
CD38+) and memory (IgD-CD38-) B cells, and an unmu- 
tated FM B cell population (IgD~CD38~), making it possi- 
ble to focus our analysis on the mutated populations and 
use the unmutated population as a negative control, which 
in turn allows the statistical association of the observed in- 
sertion and deletions to the somatic hypermutation process 
(P = 0.014). In addition, the isolation of four of the inser- 
tion/deletion events from memory B cells provides evi- 
dence that these events did not result from artifacts related 
to contamination from endonucleolytically cleaved DNA 
from the apoptotic GC cells, id) Seven of nine events re- 



ported in this study involved 7 heavy chains that contain 
nearly twice the mutations of (x heavy chains (4), further 
correlating the events described here to somatic hypermu- 
tation. (e) As discussed below, the insertion/deletion events 
described tended to involve sequence motifs resembling 
previously described hotspots of somatic hypermutation, pro- 
viding evidence that these events occur by the same pro- 
cess. 0) Finally, we extensively analyzed the V H 4 gene fam- 
ily of the tonsil donor at both the expressed and genomic 
levels, facilitating the assignment of the insertions/deletions 
as somatic rather than germline encoded. 6 of the clones 
with insertions and deletions were unique among 395 V H 4 
cDNA clones sequenced from a single tonsil, including many 
independent isolates of each of the V H 4 genes expressed 
(Table 1). In addition, we were unable to isolate genomic 
templates for any of the insertion or deletion events either 
by PCR or through the extensive characterization of the 
genomic V H 4 repertoire of the tonsil donor (Table 1). Tem- 
plating of these events from any other V H gene family can 
also be ruled out as members of the seven human Vh gene 
families differ significantly in the CDR sequences where 
the events described had occurred. 

Structural and Functional Considerations of Insertions and De- 
letions into V H Genes. The events involving the insertion 
or deletion of a single amino acid from the CDR1 or 
CDR2 would not be expected to profoundly alter the back- 
bone structure of these molecules, as the CDRs are the 
most malleable portions of antibodies. The clone g80 has 
two of the five amino acids that are customarily considered 
its CDR1 deleted, leaving only three amino acids to form 
this hypervariable loop (Fig. 1 B). Thus, this is one of the 
shortest CDRls reported to date. The clone tml21 has 
two amino acids inserted into the FW3 region. The por- 
tion of the FW3 where this insertion occurred is believed 
to be solvent exposed and corresponds to the region where 
the B cell superantigen staphylococcal protein A binds to 
most Vn3-encoded Ig molecules (28); therefore, it is likely 
that the insertion into this V H 6 clone can be tolerated as a 
loop or bulge on the molecule's surface. The most complex 
structural change observed in our study involved clone 
pg86, with a six amino acid insertion at the FW1/CDR1 
junction that would presumably double the length of this 
hypervariable loop and require dramatic structural accom- 
modation. However, we were able to express this heavy 
chain and found it paired with light chain, indicating that it 
is likely functional (Fig. 5). The clone HBp2, containing a 
triplet insert into its CDR1, is particularly interesting be- 
cause it has a known specificity. This V H 6 gene was isolated 
from a human B cell hybridoma with anti-Bordetella pertus- 
sis specificity (21, 22). Clone HBp2 has also been expressed 
in the baculovirus system and is fully functional. We are 
currently performing mutational analysis of this heavy chain 
molecule to determine if the additional inserted amino acid 
plays a role in the affinity and/or specificity of this anti- 

Analysis of Insertions and Deletions Reported in the Litera- 
ture. Various groups have reported a number of insertion 
and deletion events (Table 3). Virtually all of the insertions 
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and deletions reported from somatically mutated V genes 
involved the untranslated regions or occurred in silent pas- 
senger transgenes. 19 out of 25 insertions or deletions into 
somatically mutated genes involved predominantly repeti- 
tive elements, or in several cases other sequence patterns as- 
sociated with secondary structures such as internal homolo- 
gies or inverted repeats (Table 3). With the inclusion of the 
9 events described in this work, 28 out of 34 insertions and 
deletions involved such elements. Thus, the proximity of 
sequence elements that can be predicted to cause secondary 
structural changes in the DNA seems to be a hallmark of 
insertions and deletions into somatically mutated Vh genes. 

A Model for the Occurrence of Insertions and Deletions during 
Somatic Hypermutation. The evidence for the involvement 
of DNA secondary structure in the production of insertion 
or deletion mutations during somatic hypermutation, as sug- 
gested in 1986 by Golding et al. (29), now seems unequiv- 
ocal. The insertions and deletions described in our study, 
and those illustrated in Table 3, occur in a predictable fash- 
ion, involving sequence motifs that could form loop inter- 
mediates reminiscent of the replication slippage model of 
Stieisingei et al. (30) and Ripley and Glickman (for review 
see 31) as presented in Fig. 6. Such mutations are postu- 
lated to occur when DNA polymerase slips or stutters and 
the newly synthesized strand shifts on the template and re- 
anneals to an adjacent repetitive element, producing un- 
paired loop intermediates localized to one or the other 
strands. If this unpaired loop intermediate is not repaired 
then it will be perpetuated as an insertion of an instance of 
the repetitive element if in the daughter strand, or a dele- 
tion if in the template strand. 

A Possible Correlation to Intrinsic Hotspots. A higher fre- 
quency of somatic hypermutation has been reported to oc- 
cur at sequence motifs referred to as intrinsic hotspots (for 
review see reference 12). Interestingly, every insertion/de- 
letion event reported in our study resembled one of these 
hotspots (AGC, TAC, and AAC; references 12 and 27; Fig. 



4). The analysis of selected populations may have influ- 
enced this tendency because seven out of eight of these 
events occurred in the CDRs where it has been shown that 
hotspot motifs are preferentially found (25, 26). Further- 
more, only a weak correlation to hotspots could be found 
for the previously reported insertions/deletions involving 
unselected regions of V loci (Table 3). However, the single 
event found in this analysis that occurred outside of the 
CDRs in FW3 (clone tml21, Figs. 1 C and 4 A), also in- 
volved a tandem of possible hotspots (AAG, AAC). A more 
extensive and directed analysis is required to fully address 
this issue. 

Implications for the Molecular Mechanism of Somatic Hyper- 
mutation. The instability of repetitive tracts during DNA 
replication is a hallmark of defects in postreplicative mis- 
match repair (33), and the locus-specific downregulation of 
DNA mismatch repair in response to UV irradiation has re- 
cently been reported for immunoglobulin Vh genes in 
freshly sorted GC B cells (CD38 + IgD") compared to man- 
tle zone B cells (CD38~IgD + ; reference 34). In a recent 
study by Tran et al. (35), it was shown that tract instability 
of homonucleotide runs associated with mismatch repair 
defects occur more frequently in long than in short runs. 
These authors suggested that if loop intermediates occur in 
long repetitive tracts (>8 bp for a homonucleotide run) 
they could involve a distal repetitive element out of reach 
of the polymerase proofreading activity and only be sub- 
jected to mismatch repair. However, for short repetitive 
tracts, as for the events reported in this analysis, loop inter- 
mediates can only occur proximal to the polymerase com- 
plex and are therefore subjected to both polymerase proof- 
reading and mismatch repair mechanisms. 

All 9 events in this analysis, and 19 out of 25 events from 
the literature (28 out of 34 insertions and deletions re- 
ported), appeared to result from secondary structural inter- 
mediates. Loop intermediates proximal to the polymerase 
complex during DNA polymerization should be repaired 
by the polymerase proofreading mechanisms immediately, 
or by the postreplicative DNA repair systems. This analysis 
suggests the following characteristics for the polymerization 
process during somatic hypermutation. (a) The polymerase 
interacts with the V locus in a particularly unstable or 
"loose" fashion, especially when hotspot motifs or elements 
capable of forming secondary structures are encountered, 
allowing bp substitutions in most instances, and insertions 
or deletions via polymerase slippage at a much lower fre- 
quency: (b) it has limited proofreading capabilities; and (c) 
there is a downregulation of postreplicative mismatch re- 
pair. An efficient means to downregulate mismatch repair 
during somatic hypermutation could be through the lack of 
differentiation of the template and progeny strands for the 
mismatch repair system; lack of strand differentiation has 
been shown to increase the rate of mutations introduced 
(36). Such a system would be advantageous for the locus- 
specific V gene somatic hypermutation in that it could in- 
volve alterations of a single enzymatic complex (polymerase 
complex) rather than multiple systems (proofreading and 
mismatch repair). Another system, which would have the 
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same advantage, i.e., the alteration of a single complex, 
would be the alteration of a DNA repair system such as 
transcription-coupled repair to be the somatic mutator, as 
suggested in recent studies (13). Alternatively, the inser- 
tions and deletions might result solely from a downregula- 
tion of postreplicative mismatch repair at the V locus in the 
rapidly proliferating centroblasts that are undergoing so- 
matic hypermutation or due to a polymerase enzyme with 
such a high fault rate as to overwhelm any repair. 

All currently accepted models of somatic hypermutation, 
whether related to DNA excision-repair-like systems or 
transcription-repair, or to DNA polymerization or reverse 
transcription, involve transcriptional activation involving 
a's-factors in the V locus (enhancers, etc.) followed by the 
activity of unknown polymerase enzymes of some type. 
This analysis docs not refute or corroborate any of these 
models directly, but it does provide further characterization 
of the polymerization system involved, based on the types 



> observed and on the molecular biology that is 
known' to cause such mutations. This analysis and the 
model presented here provide further information or crite- 
ria to be contemplated as the various possible polymerase 
systems involved are considered. 

Conclusions. Insertions and deletions into immunoglob- 
ulin V H genes during somatic hypermutation are additional 
means by which the immunoglobulin repertoire can be di- 
versified. These events display characteristics supporting mod- 
els of somatic hypermutation involving a particularly unsta- 
ble or error-prone polymerase to allow the introduction of 
mutations, and involving the downregulation of DNA repair 
to allow the perpetuation of these mutations. Additionally, 
we show that these events tend to involve sequence motifs 
resembling intrinsic hotspots of somatic hypermutation, sug- 
gesting that the polymerase complex is destabilized in a se- 
quence-specific manner to allow preferential mutation at these 
sequence elements. 
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Insertions and deletions of nucleotides in the genes 
encoding the variable domains of antibodies are natural 
components of the hypermutation process, which may 
expand the available repertoire of hypervariable loop 
lengths and conformations. Although insertion of amino 
acids has also been utilized in antibody engineering, 
little is known about the functional consequences of 
such modifications. To investigate this further, we have 
introduced single-codon insertions and deletions as well 
as more complex modifications in the complementarity- 
determining regions of human antibody fragments with 
different specificities. Our results demonstrate that 
single amino acid insertions and deletions are generally 
well tolerated and permit production of stably folded 
proteins, often with retained antigen recognition, de- 
spite the fact that the thus modified loops carry amino 
acids that are disallowed at key residue positions in 
canonical loops of the corresponding length or are of a 
length not associated with a known canonical structure. 
We have thus shown that single-codon insertions and 
deletions can efficiently be utilized to expand structure 
and sequence space of the antigen-binding site beyond 
what is encoded by the germline gene repertoire. 



Antibodies are highly specific receptors of the 
tem that also have a great potential as reagents in biological 
chemistry and as therapeutic agents. The part of the antibody 
that makes contact with the antigen is comprised of two vari- 
able (V) 1 domains, the heavy (H) and the light (L), which both 
are made up of a two-0-sheet framework. From this framework, 
six complementarity-determining region (CDR) loops, three 
from the light domain and three from the heavy domain, pro- 
trude and make up the antigen-binding site (1,2). Five of these 
CDR loops generally adopt only a limited number of backbone 
conformations, so-called canonical structures (reviewed in Ref. 
3), which are determined by the lengths of the loops and by the 
presence of specific key residues. The antigen specificity of the 
binding site is mainly determined by the sequence and confor- 
mation of these CDR loops. 

Antibody diversity is generated by the imprecise recombina- 
tion of two or three sets of germline gene segments and by the 
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combination of different heavy and light domains (4). The di- 
versity is further increased by the process of somatic hypermu- 
tation (5) and by receptor editing and revision (6). As the 
germline variable gene repertoire encodes a rather limited 
number of CDR loop lengths (IMGT, the international ImMu- 
noGeneTics data base, Ref. 7), the number of observed canon- 
ical structures is similarly limited. However, it was recently 
discovered that B cells evolve the genes encoding immunoglob- 
ulin V domains not only by nucleotide substitution but also 
through an additional mechanism of insertion and deletion of 
nucleotides during the hypermutation process (8-11). This 
mechanism has the potential to expand the available repertoire 
of loop lengths and conformations if the insertions and dele- 
tions involve entire codons and occur at positions in the 
sequence that can tolerate such modifications. A number of 
examples of seemingly functional insertions and deletions in 
the CDR of both the heavy and light domains of human anti- 
bodies have in fact been encountered lately (Refs. 8 and 12 and 
references therein). Furthermore, we have recently discovered 
that human IGHV 2 germline genes carry features in CDR1 and 
CDR2 that make these regions particularly prone to deletions 



of e: 



(12). 



The occurrence of insertions and deletions in antibody V 
genes is not only of fundamental interest but is also of biotech- 
nological importance. It has been known for some time that the 
topography of the antigen-binding site is related to the size of 
the antigen (13-15). Three different types of binding sites have 
been described: cavity, groove, and planar, which roughly cor- 
respond to hapten, peptide, and protein, respectively. This re- 
lationship has been further investigated by Vargas-Madrazo et 
al. (16), who have described a correlation between the length of 
the CDR loops and the antigen recognized. According to these 
findings, cleft-like binding sites that recognize small molecules 
are created by long loops (especially the CDRH2 and LI loops), 
whereas planar-binding sites that are specific for large mole- 
cules are formed by short loops. In other words, by modifying 
the loop lengths of an antibody-binding site, it may thus be 
possible to design antibodies optimally suited for recognition of 
a particular class of antigen. Lamminmaki et al. (17) have in 
fact used this approach to modify a murine antibody specific for 
170-estradiol. They introduced additional residues into CDR2 
of the heavy domain and were able to improve the recognition 
of the antigen. This improvement was suggested to be the 
result of a deeper binding site, created through the extension of 
CDRH2, which better accommodated the hapten (17). 

Despite the establishment of insertions and deletions as 
naturally occurring modifications of antibody sequences and 
the use of amino acid insertions for antibody engineering, little 
is still known about the functional consequences of such mod- 

2 The immunoglobulin gene names used in this report are according 
to the official IMGT/HUGO nomenclature (IMGT, the international 
ImMunoGeneTics database, Ref. 7). 
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ifications. We have therefore created single-codon insertions 
and deletions as well as more complex modifications in the 
CDR of two human antibody single chain V region fragments 
(scFv) specific for a peptide and a hapten, respectively, and 
investigated the effects on antigen recognition, thermal stabil- 
ity, and protein folding. Our results demonstrate that single 
amino acid insertions in both CDRH1 and H2 and deletions in 
CDRH2 are usually well tolerated and permit production of 
folded proteins despite the fact that the modified loops carry 
amino acids that are disallowed at key residue positions in 
canonical loops of the corresponding length or do not take on a 
characteristic length of a known canonical structure. Modifica- 
tions of this kind are in other words an efficient mode of 
expanding antibody sequence and structure space beyond what 
is encoded by the germline gene repertoire, which may enable 
targeting of novel or otherwise poorly immunogenic antigens. 

EXPERIMENTAL PROCEDURES 

Antibody Frameworks — The frameworks encoding the anti-cytomeg- 
alovirus scFv AE11F and the anti-fluorescein isothiocyanate (FITC) 
scFv FITC8 have been described elsewhere (18-20). The cloning and 
production of the AE11F and AE11F/3-20L1 scFv in Pichia pastoris 
have also been described (21). 

Creation of Insertion and Deletion Variants— Mini-libraries of scFv 
genes carrying codon insertions at various positions were created by the. 
use of overlap extension PCR with degenerate primers that introduced 
NNK codons. Variants with a deletion were similarly created with 
primers lacking one codon. The AEllF-based variants carrying CDRH1 
sequences derived from the IGHV4 subgroup were created using the 
CDR-shuffling technique (22) essentially as described previously 
(21, 23). 

Production and Purification of scFu Variants — The FITC8 scFv and 
all variant scFv were cloned into the pPICZa vector (Invitrogen) with 
C-terminal FLAG sequences (24) and produced in P. pastoris as de- 
scribed previously (21). The mini-libraries encoding AJB11F and FITC8 
variants were screened for scFv production or antigen binding accord- 
nig to the colony lift assay by McGrew at al. (25). Briefly, transformed 
P. pastoris colonies were lifted onto cellulose acetate filters (Pall 
Gelman Sciences, Ann Arbor, MI) and were grown on top of nitrocellu- 
lose filters, which were placed on methanol-containing plates. After 
48 h of induction, scFv bound to the nitrocellulose filters were detected 
by a combination of anti-FLAG M2 antibody (Sigma) and rabbit anti- 
mouse Ig/horseradish peroxidase conjugate (DAKO A/S, Glostrup, 
Denmark) or FITC-biotin (Sigma) and streptavidin/horseradish perox- 
idase conjugate (DAKO A/S) using the ECL Plus™ Western blotting 
detection reagents (Amersham Biosciences) according to the manufac- 
turer's recommendations. Single colonies were also picked and grown in 
liquid cultures to enable further characterization of the antigen binding 
properties (see below). In addition, a number of scFv variants were 
produced at a larger scale and purified as monomers. The AE 1 lF-based 
variants were purified essentially as described previously (21), whereas 
the FITC8-based variants were purified by affinity chromatography on 
a Sepharose resin with FITC-conjugated bovine serum albumin (BSA) 
(kindly provided by Dr. B. Jansson, Biolnvent Therapeutic AB, Lund, 
Sweden) followed by gel filtration as before. 

Analysis of Antigen Recognition— The reactivity of the scFv variants 
with different antigens, both as crude expression supernatants and as 
purified monomers, was analyzed by enzyme-linked immunosorbent 
assay (ELISA) and by using the BIAcore technology (BIAcore AB, 
Uppsala, Sweden). The AEllF-based clones were tested on BSA, 
ovalbumin, streptavidin, and a biotinylated peptide that mimics the 
viral epitope (21) bound via streptavidin and the FITC8-based clones on 
BSA, streptavidin, FITC-BSA, FITC-biotin (bound via streptavidin), 
and a number of irrelevant BSA-coupled haptens obtained from Sigma 
or Biosearch Technologies Inc. (Novate, CA). The ELISA was performed 
according to standard protocols with anti-FLAG M2 (Sigma) and rabbit 
anti-mouse immunoglobulm/horseradish peroxidase conjugate (DAKO) 
to detect bound scFv. The BIAcore measurements and the calculation of 
the reaction rate kinetics were performed essentially as described 
previously (21). 

Differential Scanning Calorimetry (DSC)— DSC measurements were 
performed using a VP-DSC from Microcal Inc. (Northampton, MA) in 
the temperature range 20-90 °C at a heating rate of 607h. All meas- 
urements were performed in phosphate-buffered saline (PBS), pH 7.4, 
containing 0.02% sodium azide at protein concentrations between 0.1 



and 0.2 mg/ml with PBS in the reference cell. Prior to protein versus 
PBS measurements, PBS versus PBS scans were performed. 

CD Spectroscopy — CD spectra were recorded on a J-720 spectropola- 
rimeter (Jasco Inc., Easton, MD) in a 2-mm cuvette at a protein con- 
centration of 0.1 mg/ml in 50 mil sodium phosphate, pH 7.4. Each 
sample was scanned two to eight times from 250 to 200 nm at a scan 
speed of 10 nm/min, a resolution of 1 nm, a bandwidth of 1 nm, and a 
sensitivity of 20 millidegrees, and the scans were combined to produce 
the final spectrum. Data are presented as mean residue molar elliptic- 
ity, which was calculated using the mean residue weight of each scFv. 

Sequencing and Canonical Structure Classification— The nucleotide 
sequences of the variant scFv clones were determined by automated 
DNA sequencing as described elsewhere (26) after isolation of the 
templates by direct PCR on P. pastoris colonies using vector-specific 
primers. In the case of the CDRHl-grafted clones, the origin of the CDR 
was determined using the 1MGT/V-QUEST alignment tool at IMGT, the 
international ImMunoGeneTics data base (imgt.cines.fr and Ref. 7). All 
sequences were defined and numbered in accordance with the IMGT 
nomenclature and unique numbering (7). Complete sequences of the 
variant scFv from this study can be found in GenBank™ under acces- 
sion codes AF543317-AF543349. The canonical structure classification 
was performed using the software implemented on the Antibodies - 
Structure and Sequence server (www.bioinf.org.uk/abs/chothia.html 
and Ref. 27). 

RESULTS 

The scFv Frameworks — The parent antibody frameworks 
used in this study are both of human origin although there are 
differences in the way they were obtained. The AE11F scFv 
was derived from a monoclonal antibody isolated from a cyto- 
megalovirus-seropositive blood donor (18, 19). It originates 
from the IGHV3-30 and IGKV3-11 genes, which both have 
acquired a number of mutations (21). This scFv recognizes both 
intact glycoprotein B from cytomegalovirus and peptides mim- 
icking the AD-2 epitope (21, 28). The hapten (FITC)-specific 
scFv FITC8 was derived from a synthetic scFv library, which 
had been constructed by shuffling of human CDR sequences 
into a single framework consisting of the human IGHV3-23 
and 1GLV1-47 genes (20). The CDR sequences utilized by this 
scFv originate from 1GHV3-7 and IGHV3-23 in the case of 
CDRH1 and CDRH2, IGLV1-40 and IGLV1-40 or IGLV1-50 
in the case of CDRL1 and CDRL2, and IGLV1-47 in the case of 
CDRL3. Except for the CDRL1 loop, which is one residue longer 
than the IGLV1-47 germline length, the CDR loops of the 
FITC 8 scFv are of the same length as the loops normally 
encoded by the framework genes. As the structures of the 
two scFv have not been determined, the loop structures are 
unknown. However, by analyzing the deduced amino acid 
sequences using the tools at the Antibodies - Structure and 
Sequence server (27), the most similar of the observed canon- 
ical classes were identified (Table I). 

Single-codon Insertions and Deletions — To determine the 
capability of the two antibody frameworks to tolerate length 
modifications in the CDR loops, we made single-codon inser- 
tions in CDRH1 and CDRH2 and a single-codon deletion in 
CDRH2. The modifications involved insertions after positions 
31-33 in CDRH1, insertions after positions 57 and 58 in 
CDRH2, and a deletion at position 58 in CDRH2 (Fig. 1). All 
modifications were introduced at positions corresponding to the 
apices of the loops, i.e. the positions where the natural length 
variation occurs (31). A study of the IGHV germline gene rep- 
ertoire has shown that these parts of the CDR carry repetitive 
sequence tracts, which naturally target them with deletions 
(and possibly also insertions) during the hypermutational proc- 
ess (12). Residues in these regions have also been shown to 
frequently make contact with the antigen in known antibody- 
antigen complexes (15), suggesting that modifications at the 
above mentioned positions will result in an expansion of struc- 
ture space that is relevant for antigen recognition. 

Libraries of scFv clones producing different insertion vari- 
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Table I 

Examples of i i i n scFv clones based on the AEIIF and FITC8 frameworks, the canonical class 

belonging of the CDR loop:;, reactivity of the scFv with the original antigens as determined by ELISA or by BIAcore measurements, 
and the unfolding temperature of selected clones as determined by DSC 
Modification refers to the nature of the changes in loop length; Ins indicates insertion, and Del indicates deletion. Numbering is according to the 
IMGT unique numbering ( 7). Canonical class indicates the combination of canonical structures of CDRH1, H2, and LI as determined by automatic 
canonical structure classification (27). The altered canonical structure is indicated in bold. Antigen recognition: -, negative; ±, weakly positive; 
+ , positive; ++, strongly positive. 



AE11F 
ASV18 
ASV19 
ASV43 
ASV15 
ASV37 
ASV39 
ASV02 
ASV35 
ASV07 
ASV08 
ASV28 
ASV05 
ASV10 

AE11F/3-20L1 

FITC8 
FSV71 
FSV73 
FSV76 
FSV81 
FSV84 
FSV85 
FSV91 
FSV93 
FSV96 
FSV51 
FSV52 
FSV56 
FSV43 
FSV46 
FSV61 



Original sequence 
Ins Pro-31A 
Ins Asn-31A 
Ins Arg-31A 
Ins His-32A 
Ins Ile-32A 
Ins Phe-32A 
Ins Phe-33A 



-i -3 J A 



Ins Lys-57A 
Ins Ile-57A 
Ins Thr-57A 
Ins Glu-58A 
Del Val-58 
Ins CDRLl 6 

Original sequence 
Ins Ser-31A 
Ins His-31A 



Ins Pro-32A 
Ins Arg-32A 
Ins Leu-33A 
Ins His-33A 
Ins Tyr-33A 
Ins Ser-57A 
Ins Ala-57A 
Ins Leu-57A 
Ins Thr-58A 
Ins Arg-58A 
Del Gly-58 



2-3-2 
2-3-2 

2-3-2 
2-3-2 



of the created loop length is 



" U indicates that the canor 
* See text for details regarding the ir 
c The automatic canonical class algorithms failed to unambiguously predict a structure for the CDRL1 loop of this scFv. Sirr 
id sequence with Fab lf7 (PDB entry lfig) suggest that the loop belongs to canonical structure class 6 (30). 



ants were screened directly by the use of a colony lift assay (25). 
This analysis showed that —95% of the clones based on the 
FITC8 framework had retained their specificity for FITC (data 
not shown). The libraries based on the AE11F framework were 
screened for the production of FLAG-carrying proteins, and a 
similar ratio of clones positive for scFv production was obtained 
(data not shown). Both positive and negative clones from each 
library were sequenced to determine the nature of the modifi- 
cations, and the analysis showed that a wide range of amino 
acids was inserted at the intended positions. To determine the 
effect of these length modifications on the structure of the 
targeted loops, the most similar canonical structures were 
identified by the automatic canonical structure classification 
(27). A number of examples from each insertion library and the 
deletion variants are presented in Table I. 

As the AEllF-based libraries were only tested for the pro- 
duction of FLAG-tagged proteins, they had to be characterized 
further to determine whether the scFv were functionally 
folded. This was done by analyzing the antigen-binding prop- 
erties of the modified clones. Although changes in loop struc- 
ture may be associated with a loss of antigen recognition, 
specific recognition of an antigen will confirm that the polypep- 
tide chain is correctly folded as this is a requirement for it to 
function as a framework for the antigen-binding site. Analysis 
of expression supernatants of randomly picked clones (includ- 
ing the deletion variants) by ELISA or by using the BIAcore 



technology confirmed the above finding that the majority of the 
FITC8-based clones recognized the original antigen. Impor- 
tantly, this analysis showed that most of the AEllF-based 
clones had also retained their specificity for the original viral 
antigen (Table I). Furthermore, when tested for binding to a 
number of irrelevant antigens (see "Experimental Proce- 
dures"), none of the clones displayed any cross-reactivity (data 
not shown), demonstrating that the modified scFv clones re- 
tained a high degree of specificity for the original antigens and 
therefore likely also assumed a correct immunoglobulin fold. 

A number of clones of each specificity, chosen to exemplify 
the different modifications, were produced at a large scale to 
study the interaction with the original antigens in detail and 
determine the stability of the purified proteins. BIAcore meas- 
urements with the purified monomers of the ASV07, ASV10, 
ASV35, FSV43, FSV61, and FSV84 clones confirmed the pre- 
viously obtained results with crude expression supernatants 
(Table I and Fig. 2). Furthermore, evaluation of the reaction 
rate kinetics with the original antigen showed that the modi- 
fications did not affect the dissociation rates of the FITC8- 
based clones to any greater extent (Fig. 2B). The thermal 
stability of the purified monomers was determined by DSC, and 
all tested clones displayed unfolding temperatures very similar 
to the parent scFv (Table I), further verifying that the IGHV3- 
derived antibody frameworks tolerate single-codon insertions 
and deletions in CDRHl and H2 very well. 
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AE11F 



B7-15A2 



FITC8 



Fig. 1. Sequences and structures of the scFv frameworks used for production of insertion and deletion variants. A, alignment of the 
deduced amino acid sequences of the heavy V domains of the AE11F and FITC8 scFv. CDR-IMGT are boxed, and the location of the insertions and 
the deletion made in this study are indicated by arrows and an asterisk, respectively. Amino acid numbering according to the IMGT unique 
numbering is shown below the sequences. B, location of the affected sequences as indicated on a structure model of AEllF, which was generated 
using the WAM algorithm (29), a determined structure of the protein-specific antibody B7-15A2 (Protein Data Bank entry laqk), which originates 
from a highly related IGHV gene and has a CDRH3 of the same length as AEllF, and a structure model of FITC8 (20). CDRH3 is shown in red, 
whereas residues immediately adjacent to the single-codon insertions and the deletion made in this study are highlighted in blue (residues 31-34 
in CDRH1) and green (residues 57-59 in CDRH2), respectively. 
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As insertions and deletions have been demonstrated to occur 
naturally in both heavy and light domain V genes (8), we 
decided to extend this study and also evaluate the stability of a 
previously produced AEllF-based scFv variant with an inser- 
tion in CDRL1 (AE11F/3-20L1) (21). The modified CDRL1 of 
this scFv is identical, except for an additional serine residue, to 
the germline gene" from which AEllF originates. This clone has 
also been demonstrated to recognize both the epitope- 
mimicking peptide and intact, recombinant glycoprotein B, albeit 
with a lower affinity than the affinity matured AEllF scFv (21, 
32). The thermal stability of the AE11F/3-20L1 scFv was deter- 
mined as before after purification of monomelic scFv, and the 
unfolding temperature was found to be similar to that of the 



original scFv (Table I), thus indicating that not only heavy but 
also light domain CDR tolerate modifications of this nature well. 

Grafting of CDRH1 Loops from Distantly Related IGHV 
Genes — As all of the insertions and deletions described so far 
were introduced at the tips of the hypervariable loops, the parts 
of the immunoglobulin fold that best can be expected to accom- 
modate such modifications, we decided to introduce more ex- 
tensive modifications to investigate the effect of such changes 
of antibody sequence and structure. These modifications were 
introduced into and immediately adjacent to CDRH1 of the 
AEllF framework by the CDR-shuffling technique (22) using 
CDR sequences isolated from activated human B cells. Se- 
quences originating from the IGHV4 subgroup were chosen for 
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Deduced am 



o acid sequences, germline gene 01 



Table II 

>rigin, and canonical structure class belonging of the CDRH1 loops of the AE11F scFv and the 
CDRHI-grafted variants of this 

Amino acid sequences are aligned and numbered in accordance with the IMGT unique numbering (7) and gaps thereby introduced are indicated 
by dashes. Amino acids that are part of the CDR1-IMGT (7) are underlined. Dots indicate identity with the AE11F sequence. Canonical structures 
were determined by automatic canonical structure classification (27). 



scFv clone 



2 3 33334 4 

2 0 45690 4 

SCAAS GFIFSEYD — MHWVRQ 
. . .V. .GSI.SGGYYWS.I. . 
. . .V. .YSI.SGYY-WG.I. . 
. . .V. .GSI .S.Y--WS.I. . 



.GSI .( 



. H — WS . ] 



. .GSI .SGGYSWS. I. . 



IGHV3-30 
IGHV4 31 
IGHV4-b 
IGHV4-59 
IGHV4-34 
IGHV4 30-2 



the grafting as these are only distantly related to the IGHV3 
CDR and therefore allow for a higher degree of variability. In 
addition, genes from the IGHV4 subgroup encode loops of dif- 
ferent lengths than genes from the IGHV3 subgroup, including 
loops of the same length as the ones created by the single-codon 
insertions in CDRH1, thus enabling a comparison with these 
modifications. Sequencing of randomly picked clones showed 
that seemingly functional, i.e. in-frame and without stop 
codons, IGHV3 genes carrying IGHV4-derived CDRH1 se- 
quences were obtained (Table II). However, when analyzing 
crude expression supernatants of the constructs, it was found 
that all of the clones had lost the original antigen specificity 
and instead acquired a polyreactive character (Fig. 3). 

To further investigate this polyreactive nature of the 
CDRHI-grafted clones, two of them, E3 and E6, were produced 
at a larger scale and purified as monomers to enable structural 
characterization. These two clones were chosen based on the 
presence of loop lengths different from the one used by the 
parent antibody (Table II). As judged by analytical gel filtra- 
tion, these clones also gave rise to proteins that behaved as 
scFv monomers (data not shown). The overall secondary struc- 
ture was determined by CD spectroscopy and was compared 
with the results obtained with other monomeric scFv. As shown 
in Fig. 4, the spectra of both of the CDRHI-grafted clones 
displayed a strong negative signal near 200 nm, which is in- 
dicative of unordered polypeptides (33). For a comparison, the 
spectra of both the parent scFv and the FITC8 scFv displayed 
a weak negative signal near 217 nm, which is characteristic of 
the /3-sheet conformation of antibody domains (Fig. 4). The 
same result was also obtained with clones carrying single- 
codon modifications, such as the AE1 1F/3-20L1 and the FSV43, 
which gave rise to nearly identical spectra as the parent scFv 
(data not shown). When analyzed by DSC, no unfolding tem- 
peratures could be determined for either of the E3 or E6 scFv, 
suggesting that the proteins already were in an, at least partly, 
unfolded state. Thus, by inserting these only distantly related 
CDR sequences into the IGHV3 framework, the boundaries 
that define a stable immunoglobulin fold had apparently been 
exceeded. 

DISCUSSION 

Insertions and deletions of nucleotides have recently been 
shown to be an additional mechanism whereby immunoglobu- 
lin V region genes are evolved (8-11) and which may expand 
the available repertoire of antibody hypervariable loop lengths 
and structures. Although sequence modifications of this kind, 
especially insertions, have also been exploited in antibody en- 
gineering, knowledge about the effects of these modifications 
on protein stability and antigen recognition is still limited. 
Such factors are critical as they determine the success of this 
mode of molecular evolution, whether employed by nature or by 



the molecular engineer. To study the functional consequences 
of both insertions and deletions in the CDR of human antibod- 
ies, we have here made single-codon insertions and deletions as 
well as more extensive modifications in the CDR of two anti- 
body fragments with different specificities and assessed the 
thermal stability and the antigen binding properties of the 
resulting proteins. 

The single-codon modifications were well tolerated by the 
two scFv frameworks as determined by the thermal stability 
measurements and the high ratio of functional clones despite 
the fact that they created both loop lengths that do not occur 
normally within the human IGHV3 subgroup and combina- 
tions of loop lengths that do not exist in the human germline 
repertoire. Insertion of one residue in CDRH2 of the two scFv 
studied here creates a loop length (CDR2-IMGT length 9 amino 
acids) that is not naturally encoded by any IGHV genes except 
for the only member of the IGHV6 subgroup (7). This loop 
length has been predicted to have its own distinct conformation 
(canonical structure 5, Ref. 31), but as no immunoglobulin 
encoded by this gene has been structurally determined, this 
canonical structure has not been defined. The insertion of one 
residue in CDRH1 produces a loop length (CDR1-IMGT length 
9 amino acids) that occurs naturally within the human IGHV4, 
but not the IGHV3 subgroup, and which could correspond to 
canonical structure 2 as judged by the automatic canonical 
structure classification. This coexistence of canonical structure 
2 in CDRH1 with canonical structure 3 in CDRH2 (Table I) 
does not occur naturally within the human IGHV germline 
repertoire, although it has been observed in hypermutated 
antibodies with insertions in CDRH1 (8). In addition, the struc- 
ture classification also revealed that a large number of the key 
residue requirements for canonical structure 2 were not ful- 
filled (27), i.e. the thus modified CDRH1 loops either take on 
structures not covered by the described canonical structures or 
adopt the observed structure corresponding to this loop length 
despite the presence of a large number of disallowed amino 
acids at key residue positions. Irrespective of the circum- 
stances, the insertions in CDRH1 seem to, like the rest of the 
single-codon modifications, give rise to scFv that are correctly 
folded and stable. 

The fact that the loop lengths that were created by the 
single-codon insertions are not part of the IGHV3-encoded rep- 
ertoire does not mean that they are completely unnatural in 
the context of an IGHV3 framework. Apparently functional 
antibodies belonging to the IGHV3 subgroup with insertions in 
CDRH1 and CDRH2 leading to CDR-IMGT loop lengths of 9 
amino acids have in fact been described by others (8, 34, 35). As 
the deletions at position 58 in CDRH2 of both scFv give rise to 
loop lengths that are used by other members of the IGHV3 
subgroup, it is not entirely unexpected that these modifications 
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Fig. 3. Clones carrying CDRH1 sequences from distantly related 1GHV genes displayed a polyreactive t 

AEllF (□), E3 (■), E6 lO'. E10 ' A), Ell I ♦ ), and E14 (•) scFv with streptavidin-bound viral peptide (A), streptavif 
(D), and uncoated polystyrene wells (£), as determined by ELISA. Relative concentrations of the expression supei 
immunoblotting. The coefficient of variation was below 10% for the whole data set. 
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Fig. 4. CD spectroscopy indicated an unordered folding of the 
CDRHl-grafted clones. CD spectra of purified monomers of the 
AEllF (thick solid line), FITC8 (thin solid line), E3 (thick broken line), 
and E6 (thin broken line) scFv in 50 mM sodium phosphate, pH 7.4. 



are tolerated by the scFv frameworks studied here. Further- 
more, in a previous study, we have found that single-codon 
deletions, some of which have also been shown to be functional, 
occur in antibodies belonging to the IGHV3 subgroup at or 
immediately adjacent to position 58 (12). The single-codon 
modifications of antibody sequence space we have presented 
here are in other words highly representative of changes that 
may occur naturally as a consequence of the somatic hypermu- 
tation process. 

As some of the single-codon insertions produced loop lengths 
found in antibodies belonging to the IGHV4 subgroup, we de- 
cided to investigate the possibility of using CDRH1 sequences 
originating from this subgroup to diversify the AEllF scFv. 
This approach resembles evolution through receptor revision, 
which occurs in vivo (36, 37) and has also been shown to 
provide a selection advantage in vitro (38). However, grafting of 
CDRH1 loops of different lengths from the IGHV4 subgroup 
into the IGHV3 framework used by the AEllF scFv resulted 
not only in a loss of the original antigen specificity but also in 
the acquisition of a polyreactive character, even when not hav- 
ing been put through a potentially denaturing purification 
process (39), by the thus modified scFv clones (Fig. 3). This 



polyreactivity is most likely due to a destabilized or inappro- 
priately folded V domain, as demonstrated by the CD spectra of 
two of the clones (Fig. 4). Destabilizing effects of loop grafting 
into an antibody framework have been reported previously 
(40), but in that particular case, the grafted sequences were 
totally unrelated to antibody hypervariable loops. The use of 
naturally occurring CDR sequences for grafting into immuno- 
globulin frameworks often ensures that the inserted loops are 
optimally functional as they have been proofread and selected 
for functionality during the formation of the B cell receptors. 
Our data show, however, that the functionality of the grafted 
loops also depends on the framework they are inserted into 
even if they are natural immunoglobulin sequences. The reason 
for the observed effects probably lies in the differences in cer- 
tain key residues between the IGHV3 and IGHV4 frameworks. 
In fact, many of the amino acids that differ between the origi- 
nal AEllF sequence and the grafted sequences are residues 
that are used to define the canonical structures (27, 31). In 
addition, Tramontano et al. (41) have shown that framework 
residue 80 of the heavy V domain packs against residues in 
both CDRH1 (position 30) and CDRH2 (position 58) and that it 
is an important determinant of the conformation of the CDRH2 
loop. A subsequent mutational study has also shown that the 
nature of this residue determines the binding characteristics of 
an antibody by influencing the conformation of the heavy chain 
CDR loops (42). The AEllF framework has, like all unmutated 
antibodies belonging to the IGHV3 subgroup, an Arg at posi- 
tion 80, whereas all genes belonging to the IGHV4 subgroup, 
from which the CDRH1 sequences were obtained, encode a Val 
residue at this position in their germline configurations. The 
larger, charged Arg possibly causes clashes with the IGHV4- 
derived residues in and adjacent to CDRH1, which leads to an 
improper fold and poor stability of the resulting scFv product. 

In conclusion, we demonstrate here that single amino acid 
insertions in both CDRH1 and H2 and deletions in CDRH2, 
which are highly representative of modifications that occur 
naturally in regions of the hypervariable loops known to be 
involved in antigen contact (15) during the maturation of B cell 
receptors, are well tolerated and permit production of stably 
folded proteins. This is true despite the fact that the thus 
modified loops do not fulfill the key residue requirements for 
canonical loops of the corresponding length or are of a length 
not associated with a known canonical structure (27). This 
demonstrates the plasticity of antibody V domain frameworks 
belonging to the important IGHV3 subgroup, which makes up 
a large fraction of all human antibodies (43), and its capacity to 
tolerate modifications that expand sequence and structure 
space beyond the limits set by the germline-encoded diversity. 
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Based on the similarities with naturally occurring alterations 
of loop lengths, our results with insertions and deletions in 
CDRH1, H2, and LI of the antibody fragments used in this 
study, and work on an unrelated scFv with a three-amino acid 
insertion at the beginning of CDRH1 (10), 3 our conclusion is 
that both insertions and deletions can be efficiently utilized in 
antibody engineering to expand the structural space available 
to human antibodies as long as attention is paid to key residues 
in the framework (41). As demonstrated by previous studies on 
murine antibodies, this approach can be used for improving 
already existing specificities (17, 44). However, analogously 
with the correlation between CDR loop lengths and the antigen 
recognized (16), it is conceivable that it may also be utilized for 
the construction of antibody libraries specific for a particular 
class of antigens such as haptens, peptides, or large molecules. 
Finally, we hypothesize that introduction of novel loop lengths 
and combinations of loop lengths not encoded by the germline 
repertoire may also enable the targeting of poorly immunogenic 
or previously unrecognized antigens and epitopes as entirely 
new regions of antibody structure space are explored by this 
mode of sequence diversification. 
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